Re: [PATCH v10 00/12] barrier: Add smp_cond_load_{relaxed,acquire}_timeout()
From: Ankur Arora
Date: Thu Apr 02 2026 - 03:10:51 EST
Ankur Arora <ankur.a.arora@xxxxxxxxxx> writes:
> Hi,
>
> This series adds waited variants of the smp_cond_load() primitives:
> smp_cond_load_relaxed_timeout(), and smp_cond_load_acquire_timeout().
>
> With this version, the main remaining things are:
>
> - Review by PeterZ of the new interface tif_need_resched_relaxed_wait()
> (patch 11, "sched: add need-resched timed wait interface").
>
> - Review of the BPF changes. This version simplifies the rqspinlock
> changes by reusing the original error handling path
> (patches 9, 10 "bpf/rqspinlock: switch check_timeout() to a clock
> interface", "bpf/rqspinlock: Use smp_cond_load_acquire_timeout()").
>
> - Review of WFET handling. (patch 4, "arm64: support WFET in
> smp_cond_load_relaxed_timeout()").
Received Acks/R-bys for the two above.
Will send out an updated v11 which adds a testcase and a comment
mentioning that smp_cond_load_*_timeout on MMIO addresses might
break in interesting platform specific ways.
Thanks
Ankur
> The new interfaces are meant for contexts where you want to wait on a
> condition variable for a finite duration. This is easy enough to do with
> a loop around cpu_relax(). There are, however, architectures (ex. arm64)
> that allow waiting on a cacheline instead.
>
> So, these interfaces handle a mixture of spin/wait with a
> smp_cond_load() thrown in. The interfaces are:
>
> smp_cond_load_relaxed_timeout(ptr, cond_expr, time_expr, timeout)
> smp_cond_load_acquire_timeout(ptr, cond_expr, time_expr, timeout)
>
> The parameters, time_expr, timeout determine when to bail out.
>
> Also add tif_need_resched_relaxed_wait() which wraps the pattern used
> in poll_idle() and abstracts out details of the interface and those
> of the scheduler.
>
> In addition add atomic_cond_read_*_timeout(), atomic64_cond_read_*_timeout(),
> and atomic_long wrappers to the interfaces.
>
> Finally update poll_idle() and resilient queued spinlocks to use them.
>
> Changelog:
> v9 [9]:
> - s/@cond/@cond_expr/ (Randy Dunlap)
> - Clarify that SMP_TIMEOUT_POLL_COUNT is only around memory
> addresses. (David Laight)
> - Add the missing config ARCH_HAS_CPU_RELAX in arch/arm64/Kconfig.
> (Catalin Marinas).
> - Switch to arch_counter_get_cntvct_stable() (via __delay_cycles())
> in the cmpwait path instead of using arch_timer_read_counter().
> (Catalin Marinas)
>
> v8 [0]:
> - Defer evaluation of @time_expr_ns to when we hit the slowpath.
> (comment from Alexei Starovoitov).
>
> - Mention that cpu_poll_relax() is better than raw CPU polling
> only where ARCH_HAS_CPU_RELAX is defined.
> - also define ARCH_HAS_CPU_RELAX for arm64.
> (Came out of a discussion with Will Deacon.)
>
> - Split out WFET and WFE handling. I was doing both of these
> in a common handler.
> (From Will Deacon and in an earlier revision by Catalin Marinas.)
>
> - Add mentions of atomic_cond_read_{relaxed,acquire}(),
> atomic_cond_read_{relaxed,acquire}_timeout() in
> Documentation/atomic_t.txt.
>
> - Use the BIT() macro to do the checking in tif_bitset_relaxed_wait().
>
> - Cleanup unnecessary assignments, casts etc in poll_idle().
> (From Rafael Wysocki.)
>
> - Fixup warnings from kernel build robot
>
>
> v7 [1]:
> - change the interface to separately provide the timeout. This is
> useful for supporting WFET and similar primitives which can do
> timed waiting (suggested by Arnd Bergmann).
>
> - Adapting rqspinlock code to this changed interface also
> necessitated allowing time_expr to fail.
> - rqspinlock changes to adapt to the new smp_cond_load_acquire_timeout().
>
> - add WFET support (suggested by Arnd Bergmann).
> - add support for atomic-long wrappers.
> - add a new scheduler interface tif_need_resched_relaxed_wait() which
> encapsulates the polling logic used by poll_idle().
> - interface suggested by (Rafael J. Wysocki).
>
>
> v6 [2]:
> - fixup missing timeout parameters in atomic64_cond_read_*_timeout()
> - remove a race between setting of TIF_NEED_RESCHED and the call to
> smp_cond_load_relaxed_timeout(). This would mean that dev->poll_time_limit
> would be set even if we hadn't spent any time waiting.
> (The original check compared against local_clock(), which would have been
> fine, but I was instead using a cheaper check against _TIF_NEED_RESCHED.)
> (Both from meta-CI bot)
>
>
> v5 [3]:
> - use cpu_poll_relax() instead of cpu_relax().
> - instead of defining an arm64 specific
> smp_cond_load_relaxed_timeout(), just define the appropriate
> cpu_poll_relax().
> - re-read the target pointer when we exit due to the time-check.
> - s/SMP_TIMEOUT_SPIN_COUNT/SMP_TIMEOUT_POLL_COUNT/
> (Suggested by Will Deacon)
>
> - add atomic_cond_read_*_timeout() and atomic64_cond_read_*_timeout()
> interfaces.
> - rqspinlock: use atomic_cond_read_acquire_timeout().
> - cpuidle: use smp_cond_load_relaxed_tiemout() for polling.
> (Suggested by Catalin Marinas)
>
> - rqspinlock: define SMP_TIMEOUT_POLL_COUNT to be 16k for non arm64
>
>
> v4 [4]:
> - naming change 's/timewait/timeout/'
> - resilient spinlocks: get rid of res_smp_cond_load_acquire_waiting()
> and fixup use of RES_CHECK_TIMEOUT().
> (Both suggested by Catalin Marinas)
>
> v3 [5]:
> - further interface simplifications (suggested by Catalin Marinas)
>
> v2 [6]:
> - simplified the interface (suggested by Catalin Marinas)
> - get rid of wait_policy, and a multitude of constants
> - adds a slack parameter
> This helped remove a fair amount of duplicated code duplication and in
> hindsight unnecessary constants.
>
> v1 [7]:
> - add wait_policy (coarse and fine)
> - derive spin-count etc at runtime instead of using arbitrary
> constants.
>
> Haris Okanovic tested v4 of this series with poll_idle()/haltpoll patches. [8]
>
> Comments appreciated!
>
> Thanks
> Ankur
>
> [0] https://lore.kernel.org/lkml/20251215044919.460086-1-ankur.a.arora@xxxxxxxxxx/
> [1] https://lore.kernel.org/lkml/20251028053136.692462-1-ankur.a.arora@xxxxxxxxxx/
> [2] https://lore.kernel.org/lkml/20250911034655.3916002-1-ankur.a.arora@xxxxxxxxxx/
> [3] https://lore.kernel.org/lkml/20250911034655.3916002-1-ankur.a.arora@xxxxxxxxxx/
> [4] https://lore.kernel.org/lkml/20250829080735.3598416-1-ankur.a.arora@xxxxxxxxxx/
> [5] https://lore.kernel.org/lkml/20250627044805.945491-1-ankur.a.arora@xxxxxxxxxx/
> [6] https://lore.kernel.org/lkml/20250502085223.1316925-1-ankur.a.arora@xxxxxxxxxx/
> [7] https://lore.kernel.org/lkml/20250203214911.898276-1-ankur.a.arora@xxxxxxxxxx/
> [8] https://lore.kernel.org/lkml/2cecbf7fb23ee83a4ce027e1be3f46f97efd585c.camel@xxxxxxxxxx/
> [9] https://lore.kernel.org/lkml/20260209023153.2661784-1-ankur.a.arora@xxxxxxxxxx/
>
> Cc: Arnd Bergmann <arnd@xxxxxxxx>
> Cc: Will Deacon <will@xxxxxxxxxx>
> Cc: Catalin Marinas <catalin.marinas@xxxxxxx>
> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Cc: "Rafael J. Wysocki" <rafael@xxxxxxxxxx>
> Cc: Daniel Lezcano <daniel.lezcano@xxxxxxxxxx>
> Cc: Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx>
> Cc: Alexei Starovoitov <ast@xxxxxxxxxx>
> Cc: bpf@xxxxxxxxxxxxxxx
> Cc: linux-arch@xxxxxxxxxxxxxxx
> Cc: linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
> Cc: linux-pm@xxxxxxxxxxxxxxx
>
> Ankur Arora (12):
> asm-generic: barrier: Add smp_cond_load_relaxed_timeout()
> arm64: barrier: Support smp_cond_load_relaxed_timeout()
> arm64/delay: move some constants out to a separate header
> arm64: support WFET in smp_cond_load_relaxed_timeout()
> arm64: rqspinlock: Remove private copy of
> smp_cond_load_acquire_timewait()
> asm-generic: barrier: Add smp_cond_load_acquire_timeout()
> atomic: Add atomic_cond_read_*_timeout()
> locking/atomic: scripts: build atomic_long_cond_read_*_timeout()
> bpf/rqspinlock: switch check_timeout() to a clock interface
> bpf/rqspinlock: Use smp_cond_load_acquire_timeout()
> sched: add need-resched timed wait interface
> cpuidle/poll_state: Wait for need-resched via
> tif_need_resched_relaxed_wait()
>
> Documentation/atomic_t.txt | 14 +++--
> arch/arm64/Kconfig | 3 +
> arch/arm64/include/asm/barrier.h | 23 +++++++
> arch/arm64/include/asm/cmpxchg.h | 62 +++++++++++++++----
> arch/arm64/include/asm/delay-const.h | 27 +++++++++
> arch/arm64/include/asm/rqspinlock.h | 85 --------------------------
> arch/arm64/lib/delay.c | 15 ++---
> drivers/cpuidle/poll_state.c | 21 +------
> drivers/soc/qcom/rpmh-rsc.c | 8 +--
> include/asm-generic/barrier.h | 90 ++++++++++++++++++++++++++++
> include/linux/atomic.h | 10 ++++
> include/linux/atomic/atomic-long.h | 18 +++---
> include/linux/sched/idle.h | 29 +++++++++
> kernel/bpf/rqspinlock.c | 77 +++++++++++++++---------
> scripts/atomic/gen-atomic-long.sh | 16 +++--
> 15 files changed, 320 insertions(+), 178 deletions(-)
> create mode 100644 arch/arm64/include/asm/delay-const.h
--
ankur