Re: [External] : Re: [PATCH 1/2] timer: add add_timer_active_cpu()

From: Partha Satapathy

Date: Mon Jun 08 2026 - 09:54:39 EST

Thanks for the review,

RDS already checks cpu_online() before calling add_timer_on().
However, that still races with CPU hotplug and can leave a timer
queued on a CPU which goes offline before the enqueue completes.

The current workaround is to serialize the operation with
cpus_read_lock(), but doing so in the networking path adds overhead
to every timer enqueue and increases CPU hotplug latency.

> On Thu, Apr 23 2026 at 09:19, Partha Satapathy wrote:
>> From: Partha Sarathi Satapathy <partha.satapathy@xxxxxxxxxx>
>>
>> add_timer_on() can queue a timer on a CPU that goes offline before the
>> timer is enqueued. When that happens, the timer remains unserviced until
>> the CPU comes back online.
>>
>> Callers can try to avoid that by checking CPU state themselves, but that
>> does not close the race with CPU hotplug. Taking the hotplug lock around
>> every enqueue is also too expensive for users that only want CPU-local
>> placement as a performance hint.
>>
>> Add add_timer_active_cpu() for callers that want to queue a timer on a
>
> Aside of Frederic's comments this is a patently bad function name as it
> suggests that the timer should be placed on any active CPU, which makes
> it more confusing as the function has a @cpu argument.

I agree that both names are poor. I will rename them to better reflect
that the target CPU is a preferred placement target rather than a
strict requirement.

>
>> @@ -1296,7 +1297,7 @@ EXPORT_SYMBOL(add_timer_global);
>> *
>> * See add_timer() for further details.
>> */
>> -void add_timer_on(struct timer_list *timer, int cpu)
>> +static void __add_timer_on(struct timer_list *timer, int cpu, bool need_ol_cpu)
>
> How is the kernel-doc comment still valid for that function?
>
> Also 'need_ol_cpu' is the most ridiculous argument name I've seen in a
> long time. My first read was: "Need [good] ol' CPU"
>
>> {
>> struct timer_base *new_base, *base;
>> unsigned long flags;
>> @@ -1333,6 +1334,18 @@ void add_timer_on(struct timer_list *timer, int cpu)
>> WRITE_ONCE(timer->flags,
>> (timer->flags & ~TIMER_BASEMASK) | cpu);
>> }
>> +#ifdef CONFIG_HOTPLUG_CPU
>> + if (need_ol_cpu) {
>
> This #ifdeffery is pointless and can be replaced with
>
> if (IS_ENABLED(CONFIG...) && ...)
>
> But that's just a cosmetic detail.

I will restore add_timer_on(), clean up the #ifdef using IS_ENABLED() as
you suggested.

>
>> + if (!base->is_active) {
>> + raw_spin_unlock(&base->lock);
>> + base = this_cpu_ptr(&timer_bases[BASE_LOCAL]);
>> + raw_spin_lock(&base->lock);
>> + cpu = smp_processor_id();
>> + WRITE_ONCE(timer->flags,
>> + (timer->flags & ~TIMER_BASEMASK) | cpu);
>
> This is broken vs. the TIMER_MIGRATION semantics.

Agreed. The intention was to signal back to the caller that the timer
did not enqueue on the requested CPU. That information is better
tracked through the CPU hotplug infrastructure, so I will remove this
logic in the next revision.

>
>> +void add_timer_on(struct timer_list *timer, int cpu)
>> +{
>> + bool need_ol_cpu = false;
>> +
>> + __add_timer_on(timer, cpu, need_ol_cpu);
>> +}
>> EXPORT_SYMBOL_GPL(add_timer_on);
>>
>> +/**
>> + * add_timer_active_cpu - Start a timer on a particular CPU if online or current
>> + * @timer: The timer to be started
>> + * @cpu: The CPU to start it on
>> + *
>> + * This is like add_timer_on(), except that it queues the timer on the
>> + * given CPU only when that CPU is online or on the current CPU.
>> + */
>> +void add_timer_active_cpu(struct timer_list *timer, int cpu)
>> +{
>> + bool need_ol_cpu = true;
>> +
>> + __add_timer_on(timer, cpu, need_ol_cpu);
>> +}
>> +EXPORT_SYMBOL_GPL(add_timer_active_cpu);
>
> We surely need more exports just for adding a misnamed argument.
>
>> +
>> /**
>> * __timer_delete - Internal function: Deactivate a timer
>> * @timer: The timer to be deactivated
>> @@ -2507,6 +2543,7 @@ int timers_prepare_cpu(unsigned int cpu)
>> base->next_expiry_recalc = false;
>> base->timers_pending = false;
>> base->is_idle = false;
>> + base->is_active = true;
>
> That's just wrong. The base is active only when the CPU reaches online state.

Setting base->is_active = true in timers_prepare_cpu() was
intentional as a small optimization to avoid an additional hotplug
callback. Timers queued in this window will be serviced as the CPU
completes bring-up.

However, to keep the state strictly aligned with CPU online state, I
can move this activation logic to a dedicated callback under
CPUHP_AP_ONLINE_DYN instead if you prefer.

>
>> @@ -2535,6 +2572,11 @@ int timers_dead_cpu(unsigned int cpu)
>>
>> WARN_ON_ONCE(old_base->running_timer);
>> old_base->running_timer = NULL;
>> + /*
>> + * Make the dead CPU base unavailable to add_timer_on()
>> + * when the caller wants an active target CPU.
>> + */
>> + old_base->is_active = false;
>
> After a CPU went offline no timer should be queued on it anymore and
> add_timer_on() clearly lacks a
>
> if (WARN_ON(cpu_offline(cpu)))
> return;
>
> If at all what you really want is an interface, which
>
> 1) allows to give a hint about the CPU to place the timer on
>
> 2) checks whether the hinted CPU is online (or even active) and places
> if on it if so
>
> 3) if #2 fails picks a CPU on the same node when available (online or
> active)
>
> 4) Picks any online CPU (maybe prefer the calling CPU)
>
> But let me look at your argument you provided to Frederic:
>
>> Commit 4c00f3b0dea023a9851908a501735c899b0772f9
>> ("net/rds: Add preferred_cpu option to rds_rdma.ko")
>> added `preferred_cpu` so follow-up work can be placed for locality and
>> performance, not because correctness requires that exact CPU.
>
> If RDS fails to update the preferred CPU logic once that CPU goes
> offline, then that's a clear bug in that code and should not be worked
> around elsewhere.
>
> Thanks,
>
> tglx

The RDS use case treats the target CPU primarily as a locality hint
rather than a strict execution requirement.

Given your outline above, this may be better addressed by a generic
placement-hint interface than by strict CPU placement semantics.

If that direction makes sense, I can respin the series and send a V2
addressing the issues raised above.

Thanks
Partha