Re: [PATCH v8 01/11] cpuidle/poll_state: poll via smp_cond_load_relaxed()

From: Catalin Marinas
Date: Tue Oct 15 2024 - 08:04:29 EST


On Wed, Sep 25, 2024 at 04:24:15PM -0700, Ankur Arora wrote:
> diff --git a/drivers/cpuidle/poll_state.c b/drivers/cpuidle/poll_state.c
> index 9b6d90a72601..fc1204426158 100644
> --- a/drivers/cpuidle/poll_state.c
> +++ b/drivers/cpuidle/poll_state.c
> @@ -21,21 +21,20 @@ static int __cpuidle poll_idle(struct cpuidle_device *dev,
>
> raw_local_irq_enable();
> if (!current_set_polling_and_test()) {
> - unsigned int loop_count = 0;
> u64 limit;
>
> limit = cpuidle_poll_time(drv, dev);
>
> while (!need_resched()) {
> - cpu_relax();
> - if (loop_count++ < POLL_IDLE_RELAX_COUNT)
> - continue;
> -
> - loop_count = 0;
> + unsigned int loop_count = 0;
> if (local_clock_noinstr() - time_start > limit) {
> dev->poll_time_limit = true;
> break;
> }
> +
> + smp_cond_load_relaxed(&current_thread_info()->flags,
> + VAL & _TIF_NEED_RESCHED ||
> + loop_count++ >= POLL_IDLE_RELAX_COUNT);

The above is not guaranteed to make progress if _TIF_NEED_RESCHED is
never set. With the event stream enabled on arm64, the WFE will
eventually be woken up, loop_count incremented and the condition would
become true. However, the smp_cond_load_relaxed() semantics require that
a different agent updates the variable being waited on, not the waiting
CPU updating it itself. Also note that the event stream can be disabled
on arm64 on the kernel command line.

Does the code above break any other architecture? I'd say if you want
something like this, better introduce a new smp_cond_load_timeout()
API. The above looks like a hack that may only work on arm64 when the
event stream is enabled.

A generic option is udelay() (on arm64 it would use WFE/WFET by
default). Not sure how important it is for poll_idle() but the downside
of udelay() that it won't be able to also poll need_resched() while
waiting for the timeout. If this matters, you could instead make smaller
udelay() calls. Yet another problem, I don't know how energy efficient
udelay() is on x86 vs cpu_relax().

So maybe an smp_cond_load_timeout() would be better, implemented with
cpu_relax() generically and the arm64 would use LDXR, WFE and rely on
the event stream (or fall back to cpu_relax() if the event stream is
disabled).

--
Catalin