Question: One-jiffy latency from the checking in run_local_timers()

From: zhuqiuer1
Date: Mon May 20 2024 - 09:21:03 EST


Hi there, the function "kernel/time/timer.c:run_local_timers" avoids raising a softirq when there are no timers set to expire at the current time.
It achieves this by comparing the current "jiffies" with "base->next_expiry".
However, when working with SMP, it is possible that a few CPUs are reading the jiffies while it is being incremented.
These CPUs may read the old-jiffies value in "run_local_timers" and fail to invoke expired timers at this jiffy.
This results in a one-jiffy latency for the timers. Can we simply add 1 to the "jiffies" value when we compare it with next_expiry?
This may result in an unnecessary softirq being raised if a timer expires in the next jiffy, but can remove the one-jiffy latency.
Not sure if this is a positive trade-off.

Below is the example that we found to
have a few cpus reading the old-jiffies value while cpu-0 is updating the jiffies:

<idle>-0 [000] d.h. 133.492480: do_timer: updated_jiffies: 4294950645
<idle>-0 [010] d.h. 133.492480: run_local_timers: base->next_expiry: 5368691712, jiffies: 4294950644
<idle>-0 [001] d.h. 133.492480: run_local_timers: base->next_expiry: 4294950645, jiffies: 4294950644
...
<idle>-0 [006] d.h. 133.492481: run_local_timers: base->next_expiry: 4294967808, jiffies: 4294950645
...

We found that in this case the timer on cpu-1 was invoked in next jiffy but not the one it is expected to.