Re: [PATCH V1 1/1] tick: broadcast-hrtimer: Fix a race in bc_set_next

From: Thomas Gleixner
Date: Mon Sep 23 2019 - 15:52:04 EST


On Wed, 18 Sep 2019, Balasubramani Vivekanandan wrote:
>
> When there are no more cpus subscribed to broadcast, the timer callback
> might not set the expiry time for hrtimer. Therefore the callback timer
> function is modified to set the state of broadcast clock to
> CLOCK_EVT_STATE_ONESHOT_STOPPED which in turn will set the expiry time
> of hrtimer to KTIME_MAX.

That's an ugly layering violation, really.

Aside of that the whole try to cancel logic and the comment that the
hrtimer cannot be armed from within the callback is wrong. All of this can
go way.

Find a completely untested patch below.

Thanks,

tglx

8<---------------------
--- a/kernel/time/tick-broadcast-hrtimer.c
+++ b/kernel/time/tick-broadcast-hrtimer.c
@@ -42,39 +42,35 @@ static int bc_shutdown(struct clock_even
*/
static int bc_set_next(ktime_t expires, struct clock_event_device *bc)
{
- int bc_moved;
/*
- * We try to cancel the timer first. If the callback is on
- * flight on some other cpu then we let it handle it. If we
- * were able to cancel the timer nothing can rearm it as we
- * own broadcast_lock.
+ * This is called either from enter/exit idle code or from the
+ * broadcast handler. In all cases tick_broadcast_lock is held.
*
- * However we can also be called from the event handler of
- * ce_broadcast_hrtimer itself when it expires. We cannot
- * restart the timer because we are in the callback, but we
- * can set the expiry time and let the callback return
- * HRTIMER_RESTART.
+ * hrtimer_cancel() cannot be called here neither from the
+ * broadcast handler nor from the enter/exit idle code. The idle
+ * code can run into the problem described in bc_shutdown() and the
+ * broadcast handler cannot wait for itself to complete for obvious
+ * reasons.
*
- * Since we are in the idle loop at this point and because
- * hrtimer_{start/cancel} functions call into tracing,
- * calls to these functions must be bound within RCU_NONIDLE.
+ * Each caller tries to arm the hrtimer on its own CPU, but if the
+ * handler is currently running hrtimer_start() does not move
+ * it. It keeps it on the CPU on which it is assigned at the
+ * moment.
*/
- RCU_NONIDLE(
- {
- bc_moved = hrtimer_try_to_cancel(&bctimer) >= 0;
- if (bc_moved) {
- hrtimer_start(&bctimer, expires,
- HRTIMER_MODE_ABS_PINNED_HARD);
- }
- }
- );
-
- if (bc_moved) {
- /* Bind the "device" to the cpu */
- bc->bound_on = smp_processor_id();
- } else if (bc->bound_on == smp_processor_id()) {
- hrtimer_set_expires(&bctimer, expires);
- }
+ RCU_NONIDLE( {
+ hrtimer_start(&bctimer, expires, HRTIMER_MODE_ABS_PINNED_HARD);
+ /*
+ * The core tick broadcast mode expects bc->bound_on to be set
+ * correctly to prevent a CPU which has the broadcast hrtimer
+ * armed from going deep idle.
+ *
+ * As tick_broadcast_lock is held, nothing can change the
+ * cpu base which was just established in hrtimer_start()
+ * above. So the below access is safe even without holding
+ * the hrtimer base lock.
+ */
+ bc->bound_on = bctimer.base->cpu_base->cpu;
+ } );
return 0;
}

@@ -100,10 +96,6 @@ static enum hrtimer_restart bc_handler(s
{
ce_broadcast_hrtimer.event_handler(&ce_broadcast_hrtimer);

- if (clockevent_state_oneshot(&ce_broadcast_hrtimer))
- if (ce_broadcast_hrtimer.next_event != KTIME_MAX)
- return HRTIMER_RESTART;
-
return HRTIMER_NORESTART;
}