Re: [PATCH 2/2] timers: introduce timer_try_add_on_cpu.

From: Thomas Gleixner
Date: Wed Jan 15 2025 - 11:04:52 EST


On Thu, Jan 16 2025 at 00:41, Imran Khan wrote:
> + * Return:
> + * * %true - If timer was started on an online cpu
> + * * %false - If the specified cpu was offline or if its online status
> + * could not be ensured due to unavailability of hotplug lock.
> + */
> +bool timer_try_add_on_cpu(struct timer_list *timer, int cpu)
> +{
> + bool ret = true;
> +
> + if (unlikely(!cpu_online(cpu)))
> + ret = false;
> + else if (cpus_read_trylock()) {
> + if (likely(cpu_online(cpu)))
> + add_timer_on(timer, cpu);
> + else
> + ret = false;
> + cpus_read_unlock();
> + } else
> + ret = false;
> +
> + return ret;

Aside of the horrible coding style, that cpus_read_trylock() part does
not make any sense.

It's perfectly valid to queue a timer on a online CPU when the CPU
hotplug lock is held write, which can have tons of reasons even
unrelated to an actual CPU hotplug operation.

Even during a hotplug operation adding a timer on a particular CPU is
valid, whether that's the CPU which is actually plugged or not is
irrelevant.

So if we add such a function, then it needs to have very precisely
defined semantics, which have to be independent of the CPU hotplug lock.

The only way I can imagine is that the state is part of the per CPU
timer base, but then I have to ask the question what is actually tried
to solve here.

As far as I understood that there is an issue in the RDS code, queueing
a delayed work on a offline CPU, but that should have triggered at least
the warning in __queue_delayed_work(), right?

So the question is whether this try() interface is solving any of this
and not papering over the CPU hotplug related issues in the RDS code in
some way.

Thanks,

tglx