Re: [PATCH] clockevents: Prevent timer interrupt starvation

From: Thomas Gleixner

Date: Wed Apr 08 2026 - 04:52:35 EST

On Fri, Apr 03 2026 at 17:15, Calvin Owens wrote:
> On Friday 04/03 at 21:00 +0200, Thomas Gleixner wrote:
>> Btw, I'm really curious how you deduced the reproducer from systemd
>> code. I assume you figured somehow out which program triggered the
>> behaviour and then inspected the source to find something fishy. Can you
>> provide a pointer to the code in question? If they really do what your
>> reproducer does, then this code needs to be fixed too :)
>
> I pulled the text that was executing when the NMI fired out of the dump:
>
> 00 ba 38 03 00 00 48 8d 35 ce 40 18 00 48 8d 3d 16 41 18 00 e8 11 14
> e8 ff b8 f4 ff ff ff e9 6d ff ff ff 0f 1f 80 00 00 00 00 0f b6 4f 2f
> 48 8d 15 e5 5f 26 00 48 89 c8 83 e0 03 48 c1 e0 05 48
>
> ...and searched for it in systemd-networkd and all its libs. It appears
> in one spot in libsystemd-shared-259.so in path_hash_func(), so that
> must be where the userspace %ip was when the NMI fired.

Amazing.

> Unfortunately that has too many callers: I couldn't narrow it down
> meaningfully from there. Despite staring at a lot of timer code in
> systemd, I haven't yet found anything concrete that might cause buggy
> behavior.
>
> But, it stuck out at me that the detritus on the stack wasn't futex() or
> poll() or read() related. It seemed wildly improbable that the NMI
> would have just happened to catch systemd-networkd running like that, I
> guessed it was probably spinning around timerfd_settime() in userspace
> when the NMI fired (with calls to path_hash_func() somehow in-between).

Right and there is an explicit timerfd_settime(... { 0, 1 }) in the
event management code.

> My initial guess was that the trigger was something about waiting on the
> timer in a different thread than it was set on. I started to write that
> out as a small reproducer, but almost jokingly thought, "well, I should
> just try setting them blindly first and see if that works", and then my
> head exploded when it actually did :)

:)

> I've tried overloading the machine, and triggering some unrealistically
> large time steps back and forth underneath it. But I can't get systemd
> to stick itself in any sort of loop like that, or even set a single
> timer expiry to an unreasonable value.
>
> I think I will set up a little BPF thing to force systemd-networkd to
> dump core if it makes timerfd_settime() calls too quickly or with
> abstime arguments in the past, hopefully from the core I can work out
> what was going on. But any better suggestions are welcome.

It just occured to me that with the hrtimer changes, you might be able
to utilize the new hrtimer_start_expires tracepoint and enable user
stack traces to get down to the actual root cause.

Thanks,

tglx