Re: [PATCH v1 1/4] cpuidle: teo: Add polling flag check to early return path

From: Rafael J. Wysocki
Date: Fri Jan 10 2025 - 08:35:12 EST


On Fri, Jan 10, 2025 at 2:16 PM Christian Loehle
<christian.loehle@xxxxxxx> wrote:
>
> On 1/10/25 12:53, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
> >
> > After commit 6da8f9ba5a87 ("cpuidle: teo: Skip tick_nohz_get_sleep_length()
> > call in some cases") the teo governor behaves a bit differently on
> > systems where idle state 0 is a "polling" state (that is, it is not
> > really an idle state, but a loop continuously executed by the CPU).
> > Namely, on such systems it skips the tick_nohz_get_sleep_length() call
> > if the target residency of the current candidate idle state is small
> > enough.
> >
> > However, if state 0 itself was to be returned, it would be returned
> > right away without calling tick_nohz_get_sleep_length() even on systems
> > where it was not a "polling" state until commit 4b20b07ce72f ("cpuidle:
> > teo: Don't count non-existent intercepts") that attempted to fix this
> > problem.
> >
> > Unfortunately, commit 4b20b07ce72f has made the governor always call
> > tick_nohz_get_sleep_length() when about to return state 0 early, even
> > if that state is a "polling" one, which is inconsistent and defeats
> > the purpose of commit 6da8f9ba5a87 in that case.
> >
> > Address this by adding a CPUIDLE_FLAG_POLLING check to the path where
> > state 0 is returned early to prevent tick_nohz_get_sleep_length() from
> > being called if it is a "polling" state.
> >
> > Fixes: 4b20b07ce72f ("cpuidle: teo: Don't count non-existent intercepts")
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
> > ---
> > drivers/cpuidle/governors/teo.c | 3 ++-
> > 1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > --- a/drivers/cpuidle/governors/teo.c
> > +++ b/drivers/cpuidle/governors/teo.c
> > @@ -422,7 +422,8 @@
> > first_suitable_idx = i;
> > }
> > }
> > - if (!idx && prev_intercept_idx) {
> > + if (!idx && prev_intercept_idx &&
> > + !(drv->states[0].flags & CPUIDLE_FLAG_POLLING)) {
> > /*
> > * We have to query the sleep length here otherwise we don't
> > * know after wakeup if our guess was correct.
> >
> >
> >
>
> But then you do run into the issue of intercepts not being detected if
> state0 is the right choice, don't you?

That's true, but then on systems with a "polling" state 0 you still
have this problem if the state returned early is not state 0. Say C1
on x86.

The point here is that the behavior needs to be consistent, one way or another.

The exact point of commit 6da8f9ba5a87 was to avoid calling
tick_nohz_get_sleep_length() in some cases when the state to be
returned is shallow enough and obviously that includes a "polling"
state 0, possibly at the cost of being somewhat inaccurate in
prediction.

Then you're seeing this intercept accumulation for state 0 when there
are only 2 states in the table (or all of the other states are much
higher target residency than state 0).

Commit 4b20b07ce72f effectively caused tick_nohz_get_sleep_length() to
be called every time on systems without a "polling" state 0, which was
fair enough, but it also affected the other systems, which wasn't.

> This would then enable intercept-detection only for <50% of the time,
> another option is to not allow intercepts selecting a polling state, but
> there were recent complaints about this exact behavior from Aboorva (+TO).
> They don't have a low-latency non-polling state.
>
> https://lore.kernel.org/lkml/20240809073120.250974-1-aboorvad@xxxxxxxxxxxxx/

If they don't have a "polling" state 0, they won't be affected by this
patch and after commit 4b20b07ce72f, they'll always call
tick_nohz_get_sleep_length(), so the current governor behavior is
generally unsuitable for them.

I have an idea how to change it to be more accurate in prediction, but
we'll see how it goes. Stay tuned.