[Question] timers: trigger_dyntick_cpu() vs TIMER_DEFERRABLE

From: Valentin Schneider
Date: Mon Jul 25 2022 - 05:32:53 EST


Hi,

I've been incidentally staring at some NOHZ bits related to the timer
wheels, and trigger_dyntick_cpu() confuses me:

static void
trigger_dyntick_cpu(struct timer_base *base, struct timer_list *timer)
{
[...]
/*
* TODO: This wants some optimizing similar to the code below, but we
* will do that when we switch from push to pull for deferrable timers.
*/
if ((timer->flags & TIMER_DEFERRABLE)) {
if (tick_nohz_full_cpu(base->cpu))
wake_up_nohz_cpu(base->cpu);
return;
}
[...]
}

>From what I grok out of get_nohz_timer_target(), under
timers_migration_enabled we should migrate the timer to an non-idle CPU
(or at the very least a non-isolated CPU) *before* enqueuing the
timer. Without timers_migration_enabled (or if TIMER_PINNED), I don't see
anything that could migrate the timer elsewhere, so:

Why bother kicking a NOHZ CPU for a deferrable timer if it is the next
expiring one? Per the definition:

* @TIMER_DEFERRABLE: A deferrable timer will work normally when the
* system is busy, but will not cause a CPU to come out of idle just
* to service it; instead, the timer will be serviced when the CPU
* eventually wakes up with a subsequent non-deferrable timer.

I tried to find some discussion over this in LKML, but found nothing.
v3 of the patch did *not* kick a CPU for a deferrable timer, but v4 (the
one that ended up merged) did (see below). Patch in question is:

a683f390b93f ("timers: Forward the wheel clock whenever possible")

Thanks

====
v3
====
@@ -520,23 +522,27 @@ static void internal_add_timer(struct ti
__internal_add_timer(base, timer);

/*
- * Check whether the other CPU is in dynticks mode and needs
- * to be triggered to reevaluate the timer wheel. We are
- * protected against the other CPU fiddling with the timer by
- * holding the timer base lock. This also makes sure that a
- * CPU on the way to stop its tick can not evaluate the timer
- * wheel.
- *
- * Spare the IPI for deferrable timers on idle targets though.
- * The next busy ticks will take care of it. Except full dynticks
- * require special care against races with idle_cpu(), lets deal
- * with that later.
- */
- if (IS_ENABLED(CONFIG_NO_HZ_COMMON) && base->nohz_active) {
- if (!(timer->flags & TIMER_DEFERRABLE) ||
- tick_nohz_full_cpu(base->cpu))
- wake_up_nohz_cpu(base->cpu);
- }
+ * We might have to IPI the remote CPU if the base is idle and the
+ * timer is not deferrable. If the other cpu is on the way to idle
+ * then it can't set base->is_idle as we hold base lock.
+ */
+ if (!IS_ENABLED(CONFIG_NO_HZ_COMMON) || !base->is_idle ||
+ (timer->flags & TIMER_DEFERRABLE))
+ return;
+
+ /* Check whether this is the new first expiring timer */
+ if (time_after_eq(timer->expires, base->next_expiry))
+ return;
+ base->next_expiry = timer->expires;
+
+ /*
+ * Check whether the other CPU is in dynticks mode and needs to be
+ * triggered to reevaluate the timer wheel. We are protected against
+ * the other CPU fiddling with the timer by holding the timer base
+ * lock.
+ */
+ if (tick_nohz_full_cpu(base->cpu))
+ wake_up_nohz_cpu(base->cpu);
}
====
v4
====
@@ -519,24 +521,37 @@ static void internal_add_timer(struct ti
{
__internal_add_timer(base, timer);

+ if (!IS_ENABLED(CONFIG_NO_HZ_COMMON) || !base->nohz_active)
+ return;
+
/*
- * Check whether the other CPU is in dynticks mode and needs
- * to be triggered to reevaluate the timer wheel. We are
- * protected against the other CPU fiddling with the timer by
- * holding the timer base lock. This also makes sure that a
- * CPU on the way to stop its tick can not evaluate the timer
- * wheel.
- *
- * Spare the IPI for deferrable timers on idle targets though.
- * The next busy ticks will take care of it. Except full dynticks
- * require special care against races with idle_cpu(), lets deal
- * with that later.
- */
- if (IS_ENABLED(CONFIG_NO_HZ_COMMON) && base->nohz_active) {
- if (!(timer->flags & TIMER_DEFERRABLE) ||
- tick_nohz_full_cpu(base->cpu))
+ * This wants some optimizing similar to the below, but we do that
+ * when we switch from push to pull for deferrable timers.
+ */
+ if (timer->flags & TIMER_DEFERRABLE) {
+ if (tick_nohz_full_cpu(base->cpu))
wake_up_nohz_cpu(base->cpu);
+ return;
}
+
+ /*
+ * We might have to IPI the remote CPU if the base is idle and the
+ * timer is not deferrable. If the other cpu is on the way to idle
+ * then it can't set base->is_idle as we hold base lock.
+ */
+ if (!base->is_idle)
+ return;
+
+ /* Check whether this is the new first expiring timer */
+ if (time_after_eq(timer->expires, base->next_expiry))
+ return;
+
+ /*
+ * Set the next expiry time and kick the cpu so it can reevaluate the
+ * wheel
+ */
+ base->next_expiry = timer->expires;
+ wake_up_nohz_cpu(base->cpu);
}