Re: [patch 00/12] hrtimers: Prevent hrtimer interrupt starvation

From: Calvin Owens

Date: Tue Apr 07 2026 - 13:42:01 EST


On Tuesday 04/07 at 10:54 +0200, Thomas Gleixner wrote:
> Calvin reported an odd NMI watchdog lockup which claims that the CPU locked
> up in user space:
>
> https://lore.kernel.org/lkml/acMe-QZUel-bBYUh@xxxxxxxxxxxxx/
>
> He provided a reproducer, which sets up a timerfd based timer and then
> rearms it in a loop with an absolute expiry time of 1ns.

The original AMD machines survive the reproducer with this series.

Tested-by: Calvin Owens <calvin@xxxxxxxxxx>

I'm happy to test subsets of it and stable backports too, if that's
helpful, just let me know.

Thanks,
Calvin

> As the expiry time is in the past, the timer ends up as the first expiring
> timer in the per CPU hrtimer base and the clockevent device is programmed
> with the minimum delta value. If the machine is fast enough, this ends up
> in a endless loop of programming the delta value to the minimum value
> defined by the clock event device, before the timer interrupt can fire,
> which starves the interrupt and consequently triggers the lockup detector
> because the hrtimer callback of the lockup mechanism is never invoked.
>
> The first patch in the series changes the clockevent set next event
> mechanism to prevent reprogramming of the clockevent device when the
> minimum delta value was programmed unless the new delta is larger than
> that. It's a less convoluted variant of the patch which was posted in the
> above linked thread and was confirmed to prevent the starvation problem.
>
> But that's only to be considered the last resort because it results in an
> insane amount of avoidable hrtimer interrupts.
>
> The problem of user controlled timers is that the input value is only
> sanity checked vs. validity of the provided timespec and clamped to be in
> the maximum allowable range. But for performance reasons for in kernel
> usage there is no check whether a to be armed timer might have been expired
> already at enqueue time.
>
> The rest of the series addresses this by providing a separate interface to
> arm user controlled timers. This works the same way as the existing
> hrtimer_start_range_ns(), but in case that the timer ends up as the first
> timer in the clock base after enqueue it provides additional checks:
>
> - Whether the timer becomes the first expiring timer in the CPU base.
>
> If not the timer is considered to expire in the future as there is
> already an earlier event programmed.
>
> - Whether the timer has expired already by comparing the expiry value
> against current time.
>
> If it is expired, the timer is removed from the clock base and the
> function returns false, so that the caller can handle it. That's
> required because the function cannot invoke the callback as that
> might need to acquire a lock which is held by the caller.
>
> This function is then used for the user controlled timer arming interfaces
> mainly by converting hrtimer sleeper over to it. That affects a few in
> kernel users too, but the overhead is minimal in that case and it spares a
> tedious whack the mole game all over the tree.
>
> The other usage sites in posixtimers, alarmtimers and timerfd are converted
> as well, which should cover the vast majority of user space controllable
> timers as far as my investigation goes.
>
> The series applies against Linux tree and is also available from git:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git hrtimer-exp-v1
>
> There needs to be some discussion about the scope of backporting. The first
> patch preventing the stall is obviously a backport candidate. The remaining
> series can be obviously argued about, but in my opinion it should be
> backported as well as it prevents stupid or malicious user space from
> generating tons of pointless timer interrupts.
>
> Thanks,
>
> tglx
> ---
> drivers/power/supply/charger-manager.c | 12 +-
> fs/timerfd.c | 115 +++++++++++++++-----------
> include/linux/alarmtimer.h | 9 +-
> include/linux/clockchips.h | 2
> include/linux/hrtimer.h | 20 +++-
> include/trace/events/timer.h | 13 +++
> kernel/time/alarmtimer.c | 70 +++++++---------
> kernel/time/clockevents.c | 23 +++--
> kernel/time/hrtimer.c | 142 +++++++++++++++++++++++++++++----
> kernel/time/posix-cpu-timers.c | 18 ++--
> kernel/time/posix-timers.c | 35 +++++---
> kernel/time/posix-timers.h | 4
> kernel/time/tick-common.c | 1
> kernel/time/tick-sched.c | 1
> net/netfilter/xt_IDLETIMER.c | 24 ++++-
> 15 files changed, 341 insertions(+), 148 deletions(-)
>
>