Re: [External] : Re: [PATCH 0/2] timers/workqueue: Add support for active CPU

From: Partha Satapathy

Date: Mon Apr 27 2026 - 12:53:33 EST

Thanks for the feedback,
One concrete example is our downstream RDS/RDMA code.

Commit 4c00f3b0dea023a9851908a501735c899b0772f9
("net/rds: Add preferred_cpu option to rds_rdma.ko")
added `preferred_cpu` so follow-up work can be placed for locality and
performance, not because correctness requires that exact CPU.

The motivation was to keep processing close to where the completion was
observed. The aim is to keep the workers near the CPU where the HCA's
queues are mapped, so completion handling and follow-up work remain
local. With `preferred_cpu=cq`, work is steered toward the CPU
matching the CQ interrupt affinity, which helps latency and cache
locality. With `preferred_cpu=numa`, work is kept on CPUs local to the
HCA's NUMA node, which helps scale-out while preserving locality to the
device and memory.

So here the chosen CPU is only a preference. If that CPU goes offline
in the window between selection and queueing, the delayed timer can end
up queued on a CPU that is no longer active, and execution is then
postponed until that CPU comes back.

For this class of user, fallback to an active CPU is preferable,
because the placement is an optimization, not a correctness requirement.

I also do not think this is specific to RDS. In general, network and
RDMA drivers often choose CPUs based on IRQ affinity, queue affinity, or
NUMA locality to improve latency and throughput, while still being able
to run on any online CPU. For that kind of workload, "active if
possible, fallback otherwise" looks like a useful generic behavior.

That said, I agree that a downstream-only example by itself may not be
enough to justify a mainline core API change. I mentioned RDS mainly to
answer your question about whether there are real users that choose CPUs
for performance/locality rather than strict correctness.

On 23-04-2026 17:35, Frederic Weisbecker wrote:
> Hi,
>
> Le Thu, Apr 23, 2026 at 09:19:05AM +0000, Partha Satapathy a écrit :
>> From: Partha Sarathi Satapathy <partha.satapathy@xxxxxxxxxx>
>>
>> Hi,
>>
>> Timers queued with add_timer_on() and delayed work queued with
>> queue_delayed_work_on() currently rely on the caller to ensure that the
>> target CPU remains online until the enqueue operation completes. In
>> practice, CPU hotplug can still race with that sequence and leave the
>> timer queued on an offline CPU, where it will not run until that CPU
>> comes back online.
>>
>> For delayed work, this has a direct knock-on effect: if the backing
>> timer is stranded on an offline CPU, the work item is never queued for
>> execution until that CPU returns.
>>
>> In many cases, the target CPU is chosen for locality and cache affinity
>> rather than as a strict execution requirement. Falling back to an active
>> CPU is preferable to leaving the timer or delayed work blocked on a dead
>> CPU. While callers can try to track CPU hotplug state themselves, that
>> does not close the race, and taking the hotplug lock around enqueue
>> operations is too expensive for this class of use.
>>
>> This series adds opt-in helpers for that fallback behavior without
>> changing the semantics of the existing interfaces:
>>
>> - add_timer_active_cpu() queues a timer on the requested CPU only if
>> the target CPU's timer base is active; otherwise it falls back to
>> the current CPU.
>>
>> - queue_delayed_work_active_cpu() uses the new timer helper for the
>> delayed timer path and updates dwork->cpu to reflect the CPU
>> actually selected for the timer, so the work item is queued on the
>> same active CPU.
>>
>> The existing add_timer_on() and queue_delayed_work_on() behavior is left
>> unchanged for callers that require strict CPU placement.
>
> Timers are migrated when CPUs go offline. So the problem is queueing
> a timer to an offline CPU. It should be the responsibility of a subsystem
> to synchronize with CPU hotplug in order to avoid that.
>
> As for timers that are queued locally not for correctness but for performance
> reasons, do we know such example?
>
> Thanks.
>