Re: [PATCH] clockevents: Prevent timer interrupt starvation

From: Calvin Owens

Date: Fri Apr 03 2026 - 20:20:23 EST


On Friday 04/03 at 21:00 +0200, Thomas Gleixner wrote:
> On Fri, Apr 03 2026 at 08:58, Calvin Owens wrote:
> > On Friday 04/03 at 16:41 +0200, Thomas Gleixner wrote:
> >> I'm an idiot. When I polished the patch up, I dropped the hunks which
> >> clear the flag in the interrupt handler and tired brain did not notice
> >> despite checking five times in a row. Updated version below.
> >
> > That did it, both AMD machines survive the reproducer and are well
> > behaved afterwards.
>
> Thank you and sorry for the nuisance.
>
> > If you like:
> >
> > Tested-By: Calvin Owens <calvin@xxxxxxxxxx>
>
> I will probably post a slightly different version similar to the one I
> sent in the reply to Peter and if you have time then I would appreciate
> a tested-by on that final to be polished version.

I will take a look.

> Btw, I'm really curious how you deduced the reproducer from systemd
> code. I assume you figured somehow out which program triggered the
> behaviour and then inspected the source to find something fishy. Can you
> provide a pointer to the code in question? If they really do what your
> reproducer does, then this code needs to be fixed too :)

I pulled the text that was executing when the NMI fired out of the dump:

00 ba 38 03 00 00 48 8d 35 ce 40 18 00 48 8d 3d 16 41 18 00 e8 11 14
e8 ff b8 f4 ff ff ff e9 6d ff ff ff 0f 1f 80 00 00 00 00 0f b6 4f 2f
48 8d 15 e5 5f 26 00 48 89 c8 83 e0 03 48 c1 e0 05 48

...and searched for it in systemd-networkd and all its libs. It appears
in one spot in libsystemd-shared-259.so in path_hash_func(), so that
must be where the userspace %ip was when the NMI fired.

Unfortunately that has too many callers: I couldn't narrow it down
meaningfully from there. Despite staring at a lot of timer code in
systemd, I haven't yet found anything concrete that might cause buggy
behavior.

But, it stuck out at me that the detritus on the stack wasn't futex() or
poll() or read() related. It seemed wildly improbable that the NMI
would have just happened to catch systemd-networkd running like that, I
guessed it was probably spinning around timerfd_settime() in userspace
when the NMI fired (with calls to path_hash_func() somehow in-between).

My initial guess was that the trigger was something about waiting on the
timer in a different thread than it was set on. I started to write that
out as a small reproducer, but almost jokingly thought, "well, I should
just try setting them blindly first and see if that works", and then my
head exploded when it actually did :)

I've tried overloading the machine, and triggering some unrealistically
large time steps back and forth underneath it. But I can't get systemd
to stick itself in any sort of loop like that, or even set a single
timer expiry to an unreasonable value.

I think I will set up a little BPF thing to force systemd-networkd to
dump core if it makes timerfd_settime() calls too quickly or with
abstime arguments in the past, hopefully from the core I can work out
what was going on. But any better suggestions are welcome.

Thanks,
Calvin