Re: [PATCH 1/4] asm-generic: barrier: Add smp_cond_load_relaxed_timewait()

From: Catalin Marinas
Date: Tue Mar 04 2025 - 14:28:05 EST


On Mon, Feb 03, 2025 at 01:49:08PM -0800, Ankur Arora wrote:
> Add smp_cond_load_relaxed_timewait(), a timed variant of
> smp_cond_load_relaxed(). This is useful for cases where we want to
> wait on a conditional variable but don't want to wait indefinitely.

Bikeshedding: why not "timeout" rather than "timewait"?

> diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
> index d4f581c1e21d..31de8ed2a05e 100644
> --- a/include/asm-generic/barrier.h
> +++ b/include/asm-generic/barrier.h
> @@ -273,6 +273,54 @@ do { \
> })
> #endif
>
> +#ifndef smp_cond_time_check_count
> +/*
> + * Limit how often smp_cond_load_relaxed_timewait() evaluates time_expr_ns.
> + * This helps reduce the number of instructions executed while spin-waiting.
> + */
> +#define smp_cond_time_check_count 200
> +#endif

While this was indeed added to the poll_idle() loop, it feels completely
random in a generic implementation. It's highly dependent on the
time_expr_ns passed. Can the caller not move the loop in time_expr_ns
before invoking this macro?

> +
> +/**
> + * smp_cond_load_relaxed_timewait() - (Spin) wait for cond with no ordering
> + * guarantees until a timeout expires.
> + * @ptr: pointer to the variable to wait on
> + * @cond: boolean expression to wait for
> + * @time_expr_ns: evaluates to the current time
> + * @time_limit_ns: compared against time_expr_ns
> + *
> + * Equivalent to using READ_ONCE() on the condition variable.
> + *
> + * Note that the time check in time_expr_ns can be synchronous or
> + * asynchronous.
> + * In the generic version the check is synchronous but kept coarse
> + * to minimize instructions executed while spin-waiting.
> + */

Not sure exactly what synchronous vs asynchronous here mean. I see the
latter more like an interrupt. I guess what you have in mind is the WFE
wakeup events on arm64, though they don't interrupt the instruction
flow. I'd not bother specifying this at all.

> +#ifndef __smp_cond_load_relaxed_spinwait
> +#define __smp_cond_load_relaxed_spinwait(ptr, cond_expr, time_expr_ns, \
> + time_limit_ns) ({ \
> + typeof(ptr) __PTR = (ptr); \
> + __unqual_scalar_typeof(*ptr) VAL; \
> + unsigned int __count = 0; \
> + for (;;) { \
> + VAL = READ_ONCE(*__PTR); \
> + if (cond_expr) \
> + break; \
> + cpu_relax(); \
> + if (__count++ < smp_cond_time_check_count) \
> + continue; \
> + if ((time_expr_ns) >= (time_limit_ns)) \
> + break; \
> + __count = 0; \
> + } \
> + (typeof(*ptr))VAL; \
> +})
> +#endif
> +
> +#ifndef smp_cond_load_relaxed_timewait
> +#define smp_cond_load_relaxed_timewait __smp_cond_load_relaxed_spinwait
> +#endif

What I don't particularly like about this interface is (1) no idea of
what time granularity it offers, how much it can slip past the deadline,
even though there's some nanoseconds implied and (2) time_expr_ns leaves
the caller to figure out why time function to use for tracking the time.
Well, I can be ok with (2) if we make it a bit more generic.

The way it is written, I guess the type of the time expression and limit
no longer matters as long as you can compare them. The naming implies
nanoseconds but we don't have such precision, especially with the WFE
implementation for arm64. We could add a slack range argument like the
delta_ns for some of the hrtimer API and let the arch code decide
whether to honour it.

What about we drop the time_limit_ns and build it into the time_expr_ns
as a 'time_cond' argument? The latter would return the result of some
comparison and the loop bails out if true. An additional argument would
be the minimum granularity for checking the time_cond and the arch code
may decide to fall back to busy loops if the granularity is larger than
what the caller required.

--
Catalin