Re: the stuttering regression in 7.0: should I have done something different

From: Tony Rodriguez

Date: Thu May 14 2026 - 03:24:54 EST


Hi Thomas,

Cheers!

Initial validation of the test patches for v7.0.6 and 7.1-rc3 on the S7-2 looks promising: I have not observed panics, timer delays, or other timer-related issues so far. I will pause broader validation on the S7-2 and T7-1 until I receive your recommendation or any requested revisions (see inline comments below).

Note: I did see an intermittent error on the S7-2 running 7.1-rc3, usually when the system is under heavy load during a kernel build. I’m not sure whether it is a separate problem?

"[676.464681] BUG: Bad rss-counter state mm:000000008d9f1cf2 type:MM_FILEPAGES val:-4096 Comm:cc1 Pid:78165".

On 5/13/26 1:28 PM, Thomas Gleixner wrote:

Just to be clear: I never saw the VHDL code of that CPU, but that
pattern is way too familiar.

Those equal comparators, which were designed by AI (Absence of
Intelligence) before AI got popular, generally work this way:

The comparator is only evaluated on the clock edge which increments
the counter, but not when the comparator value is written. So a write
of the same value does not result in an interrupt.

That's an "optimization" which spares quite a few gates and is obviously
nowhere documented. So software has to deal with the consequences by
using a crystal ball, which is trivial to get wrong and can go unnoticed
for a long time until it roars it's ugly head at some point for whatever
reasons.

I'm willing to bet a round of beers at the next conference that this is
the problem and that it will magically disappear when you change that
condition to:

return (read_cnt() - exp) >= 0 ? -ETIME : 0;

Attempted to locate "return (read_cnt() - exp) >= 0 ? -ETIME : 0;" but could not find an exact match. After additional inspection I updated the following functions "tick_add_compare()" and "stick_add_compare()" in arch/sparc/kernel/time_64.c to from "> 0L" to ">= 0L". This appears to have resolved the lost-timer behavior.

--- time_64.c.orig
+++ time_64.c
@@ -146,7 +146,7 @@
                             : "=r" (new_tick));
        new_tick &= ~TICKCMP_IRQ_BIT;

-       return ((long)(new_tick - (orig_tick+adj))) > 0L;
+       return ((long)(new_tick - (orig_tick+adj))) >= 0L;
 }

 static unsigned long tick_add_tick(unsigned long adj)
@@ -277,7 +277,7 @@
                             : "=r" (new_tick));
        new_tick &= ~TICKCMP_IRQ_BIT;

-       return ((long)(new_tick - (orig_tick+adj))) > 0L;
+       return ((long)(new_tick - (orig_tick+adj))) >= 0L;
 }

 static unsigned long stick_get_frequency(void)


unless they managed to add some extra propagation delay to that
comparator write like the HPET folks did at some point without telling
anyone. I doubt the SPARC janitor who implemented it did so because
that would have made the failure way more likely.

I have truly no idea why the original code did not expose this problem,
though it might have been just papered over by sheer luck and timing.

Thanks,

tglx
---
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -381,6 +381,8 @@ int clockevents_program_event(struct clo
if (dev->set_next_event(dev->min_delta_ticks, dev)) {
if (!force || clockevents_program_min_delta(dev))
return -ETIME;
+ } else if (delta <= 0) {
+ dev->next_event = ktime_add_ns(ktime_get(), dev->min_delta_ns);
}
dev->next_event_forced = 1;
return 0;

You mentioned this kernel/time/clockevents.c patch is optional, but I propose revising clockevents_program_event(). If the requested event time is already at or before now, record a sane next_event (now + min_delta) so core code sees a future expected time and can behave correctly. Does this seem reasonable?

 --- clockevents.c.orig
+++ clockevents.c
@@ -347,6 +347,11 @@
        if (dev->set_next_event(dev->min_delta_ticks, dev)) {
                if (!force || clockevents_program_min_delta(dev))
                        return -ETIME;
+       } else {
+               ktime_t now = ktime_get();
+               s64 delta_ns = ktime_to_ns(ktime_sub(expires, now));
+               if (delta_ns <= 0)
+                       dev->next_event = ktime_add_ns(now, dev->min_delta_ns);
        }
        dev->next_event_forced = 1;
        return 0;