Re: [PATCH v1 2/2] cpuidle: governors: teo: Rearrange stopped tick handling

From: Rafael J. Wysocki

Date: Fri Feb 20 2026 - 06:12:29 EST


On Fri, Feb 20, 2026 at 11:16 AM Christian Loehle
<christian.loehle@xxxxxxx> wrote:
>
> On 2/18/26 18:37, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
> >
> > This change is based on the observation that it is not in fact necessary
> > to select a deep idle state every time the scheduler tick has been
> > stopped before the idle state selection takes place. Namely, if the
> > time till the closest timer (that is not the tick) is short enough,
> > a shallow idle state can be selected because the timer will kick the
> > CPU out of that state, so the damage from a possible overly optimistic
> > selection will be limited.
> >
> > Update the teo governor in accordance with the above in analogy with
> > the previous analogous menu governor update.
> >
> > Among other things, this will cause the teo governor to call
> > tick_nohz_get_sleep_length() every time when the tick has been
> > stopped already and only change the original idle state selection
> > if the time till the closest timer is beyond SAFE_TIMER_RANGE_NS
> > which is way more straightforward than the current code flow.
> >
> > Of course, this effectively throws away some of the recent teo governor
> > changes, but the resulting simplification is worth it in my view.
> >
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
> > ---
> > drivers/cpuidle/governors/teo.c | 80 ++++++++++++++++------------------------
> > 1 file changed, 33 insertions(+), 47 deletions(-)
> >
> > --- a/drivers/cpuidle/governors/teo.c
> > +++ b/drivers/cpuidle/governors/teo.c
> > @@ -413,50 +413,13 @@ static int teo_select(struct cpuidle_dri
> > * better choice.
> > */
> > if (2 * idx_intercept_sum > cpu_data->total - idx_hit_sum) {
> > - int min_idx = idx0;
> > -
> > - if (tick_nohz_tick_stopped()) {
> > - /*
> > - * Look for the shallowest idle state below the current
> > - * candidate one whose target residency is at least
> > - * equal to the tick period length.
> > - */
> > - while (min_idx < idx &&
> > - drv->states[min_idx].target_residency_ns < TICK_NSEC)
> > - min_idx++;
> > -
> > - /*
> > - * Avoid selecting a state with a lower index, but with
> > - * the same target residency as the current candidate
> > - * one.
> > - */
> > - if (drv->states[min_idx].target_residency_ns ==
> > - drv->states[idx].target_residency_ns)
> > - goto constraint;
> > - }
> > -
> > - /*
> > - * If the minimum state index is greater than or equal to the
> > - * index of the state with the maximum intercepts metric and
> > - * the corresponding state is enabled, there is no need to look
> > - * at the deeper states.
> > - */
> > - if (min_idx >= intercept_max_idx &&
> > - !dev->states_usage[min_idx].disable) {
> > - idx = min_idx;
> > - goto constraint;
> > - }
> > -
> > /*
> > * Look for the deepest enabled idle state, at most as deep as
> > * the one with the maximum intercepts metric, whose target
> > * residency had not been greater than the idle duration in over
> > * a half of the relevant cases in the past.
> > - *
> > - * Take the possible duration limitation present if the tick
> > - * has been stopped already into account.
> > */
> > - for (i = idx - 1, intercept_sum = 0; i >= min_idx; i--) {
> > + for (i = idx - 1, intercept_sum = 0; i >= idx0; i--) {
> > intercept_sum += cpu_data->state_bins[i].intercepts;
> >
> > if (dev->states_usage[i].disable)
> > @@ -469,7 +432,6 @@ static int teo_select(struct cpuidle_dri
> > }
> > }
> >
> > -constraint:
> > /*
> > * If there is a latency constraint, it may be necessary to select an
> > * idle state shallower than the current candidate one.
> > @@ -478,13 +440,13 @@ constraint:
> > idx = constraint_idx;
> >
> > /*
> > - * If either the candidate state is state 0 or its target residency is
> > - * low enough, there is basically nothing more to do, but if the sleep
> > - * length is not updated, the subsequent wakeup will be counted as an
> > - * "intercept" which may be problematic in the cases when timer wakeups
> > - * are dominant. Namely, it may effectively prevent deeper idle states
> > - * from being selected at one point even if no imminent timers are
> > - * scheduled.
> > + * If the tick has not been stopped and either the candidate state is
> > + * state 0 or its target residency is low enough, there is basically
> > + * nothing more to do, but if the sleep length is not updated, the
> > + * subsequent wakeup will be counted as an "intercept". That may be
> > + * problematic in the cases when timer wakeups are dominant because it
> > + * may effectively prevent deeper idle states from being selected at one
> > + * point even if no imminent timers are scheduled.
> > *
> > * However, frequent timers in the RESIDENCY_THRESHOLD_NS range on one
> > * CPU are unlikely (user space has a default 50 us slack value for
> > @@ -500,7 +462,8 @@ constraint:
> > * shallow idle states regardless of the wakeup type, so the sleep
> > * length need not be known in that case.
> > */
> > - if ((!idx || drv->states[idx].target_residency_ns < RESIDENCY_THRESHOLD_NS) &&
> > + if (!tick_nohz_tick_stopped() && (!idx ||
> > + drv->states[idx].target_residency_ns < RESIDENCY_THRESHOLD_NS) &&
> > (2 * cpu_data->short_idles >= cpu_data->total ||
> > latency_req < LATENCY_THRESHOLD_NS))
> > goto out_tick;
> > @@ -508,6 +471,29 @@ constraint:
> > duration_ns = tick_nohz_get_sleep_length(&delta_tick);
> > cpu_data->sleep_length_ns = duration_ns;
> >
> > + /*
> > + * If the tick has been stopped and the closest timer is too far away,
> > + * update the selection to prevent the CPU from getting stuck in a
> > + * shallow idle state for too long.
> > + */
> > + if (tick_nohz_tick_stopped() && duration_ns > SAFE_TIMER_RANGE_NS &&
> > + drv->states[idx].target_residency_ns < TICK_NSEC) {
> > + /*
> > + * Look for the deepest enabled idle state with target
> > + * residency within duration_ns.
> > + */
> > + for (i = drv->state_count - 1; i > idx; i--) {
>
> s/drv->state_count - 1/constraint_idx/
> to respect the latency_req here?

Good point, yes.

> > + if (dev->states_usage[i].disable)
> > + continue;
> > +
> > + if (drv->states[i].target_residency_ns <= duration_ns) {
> > + idx = i;
> > + break;
> > + }
> > + }
> > + return idx;
> > + }
> > +
> > if (!idx)
> > goto out_tick;