Query about timer wheel API

From: imran . f . khan
Date: Sun Dec 22 2024 - 19:15:09 EST


Hello Thomas,

Could you kindly help me, regarding a query about timer wheel APIs.
Right now we use add_timer or add_timer_on, to add a timer to any
or to a specific CPU respectively.
Would it be useful to have an interface like try_add_timer_on, that
would return an error or would use add_timer, if the specified CPU is
offline.

Recently we have come across some bugs in the RDS code, where a delayed
work was being queued on an offlined CPU and as a result of that the
underlying timer was not firing, which in turn meant that the work was
never able to make it to the intended worker_pool.

I understand that this is something that needs fixing at caller side and
we are taking that approach.

But I also wanted to understand if there is some scope of change on timer
side, for such situations. I saw your reply in [1] and agree with your point.
But that conversation is more than a decade old, so I thought of asking this
question, assuming that there may be some other use cases that can utilize
this new interface.

One can also ask to change queue_delayed_work_on or have an equivalent,
that would check if CPU is online before doing add_timer_on but I am not sure
if workqueue is the only subsystem that can run into this situation.

Thanks in advance for your help,
Imran

[1]: https://lists.linuxcoding.com/kernel/2007-q4/msg27627.html