Re: [PATCH AUTOSEL 6.18] clockevents: Prevent timer interrupt starvation
From: Thomas Gleixner
Date: Mon Apr 20 2026 - 11:07:52 EST
On Mon, Apr 20 2026 at 09:09, Sasha Levin wrote:
> From: Thomas Gleixner <tglx@xxxxxxxxxx>
>
> [ Upstream commit d6e152d905bdb1f32f9d99775e2f453350399a6a ]
>
> Calvin reported an odd NMI watchdog lockup which claims that the CPU locked
> up in user space. He provided a reproducer, which sets up a timerfd based
> timer and then rearms it in a loop with an absolute expiry time of 1ns.
>
> As the expiry time is in the past, the timer ends up as the first expiring
> timer in the per CPU hrtimer base and the clockevent device is programmed
> with the minimum delta value. If the machine is fast enough, this ends up
> in a endless loop of programming the delta value to the minimum value
> defined by the clock event device, before the timer interrupt can fire,
> which starves the interrupt and consequently triggers the lockup detector
> because the hrtimer callback of the lockup mechanism is never invoked.
>
> As a first step to prevent this, avoid reprogramming the clock event device
> when:
> - a forced minimum delta event is pending
> - the new expiry delta is less then or equal to the minimum delta
>
> Thanks to Calvin for providing the reproducer and to Borislav for testing
> and providing data from his Zen5 machine.
>
> The problem is not limited to Zen5, but depending on the underlying
> clock event device (e.g. TSC deadline timer on Intel) and the CPU speed
> not necessarily observable.
>
> This change serves only as the last resort and further changes will be made
> to prevent this scenario earlier in the call chain as far as possible.
>
> [ tglx: Updated to restore the old behaviour vs. !force and delta <= 0 and
> fixed up the tick-broadcast handlers as pointed out by Borislav ]
>
> Fixes: d316c57ff6bf ("[PATCH] clockevents: add core functionality")
Please hold that off until
4096fd0e8eae ("clockevents: Add missing resets of the next_event_forced flag")
hits Linus tree. It fixes above commit and is marked for stable. So
ideally you apply them together.
4096fd0e8eae will not apply to 7.0 and older. I'll provide you a updated
version once Linus pulled it.
Thanks,
tglx