Re: [patch 01/12] clockevents: Prevent timer interrupt starvation

From: Thomas Gleixner

Date: Tue Apr 07 2026 - 12:16:57 EST

On Tue, Apr 07 2026 at 16:00, Frederic Weisbecker wrote:
> Le Tue, Apr 07, 2026 at 10:54:17AM +0200, Thomas Gleixner a écrit :
>> From: Thomas Gleixner <tglx@xxxxxxxxxx>
>>
>> Calvin reported an odd NMI watchdog lockup which claims that the CPU locked
>> up in user space. He provided a reproducer, which sets up a timerfd based
>> timer and then rearms it in a loop with an absolute expiry time of 1ns.
>>
>> As the expiry time is in the past, the timer ends up as the first expiring
>> timer in the per CPU hrtimer base and the clockevent device is programmed
>> with the minimum delta value. If the machine is fast enough, this ends up
>> in a endless loop of programming the delta value to the minimum value
>> defined by the clock event device, before the timer interrupt can fire,
>> which starves the interrupt and consequently triggers the lockup detector
>> because the hrtimer callback of the lockup mechanism is never invoked.
>>
>> As a first step to prevent this, avoid reprogramming the clock event device
>> when:
>> - a forced minimum delta event is pending
>> - the new expiry delta is less then or equal to the minimum delta
>>
>> Thanks to Calvin for providing the reproducer and to Borislav for testing
>> and providing data from his Zen5 machine.
>>
>> The problem is not limited to Zen5, but depending on the underlying
>> clock event device (e.g. TSC deadline timer on Intel) and the CPU speed
>> not necessarily observable.
>>
>> This change serves only as the last resort and further changes will be made
>> to prevent this scenario earlier in the call chain as far as possible.
>>
>> Fixes: d316c57ff6bf ("[PATCH] clockevents: add core functionality")
>> Reported-by: Calvin Owens <calvin@xxxxxxxxxx>
>> Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxx>
>> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
>> Cc: Anna-Maria Behnsen <anna-maria@xxxxxxxxxxxxx>
>> Cc: Frederic Weisbecker <frederic@xxxxxxxxxx>
>> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
>> Link: https://lore.kernel.org/lkml/acMe-QZUel-bBYUh@xxxxxxxxxxxxx/
>> ---
>> V2: Simplified the clockevents code - Peter
>
> Isn't it possible to rely on dev->next_event instead? In the above scenario,
> subsequent 0 delta would not reprogram if dev->next_event is already below
> the new call to ktime_get() ?

It does if force is set and that is set when hrtimer calls into it:

if (delta <= 0)
return force ? clockevents_program_min_delta(dev) : -ETIME;

I can't change that for various reasons.

But we always need the flag which tells us that the programming was
forced in order to prevent the above scenario. And delta <= 0 is not the
only way how to achieve that. You can have a delta > 0 and < min_delta
anc achieve the same effect. That needs more effort on the callsite, but
it's trivially doable as the systemcall to reprogram time is pretty
constant.

As I had to introduce the flag and prevent the other scenraio I just
consolidated everything into one code path.

Thanks,

tglx