Re: [patch 5/5] clocksource: Rewrite watchdog code completely

From: Daniel J Blueman

Date: Sun Feb 15 2026 - 07:18:54 EST

On Mon, 2 Feb 2026 at 19:27, Thomas Gleixner <tglx@xxxxxxxxxx> wrote:
> >> +/* Maximum time between two watchdog readouts */
> >> +#define WATCHDOG_READOUT_MAX_NS (50 * NSEC_PER_USEC)
>
> > At 1920 threads, the default timeout threshold of 20us triggers
> > continuous warnings at idle, however 1000us causes none under an 8
> > hour adverse workload [1]; no HPET fallback was seen. A 500us
> > threshold causes a low rate of timeouts [2] (overhead amplified due to
> > retries), thus 1000us adds margin and should prevent retries.
>
> Right. Idle is definitely an issue when the remote CPU is in a deep
> C-state.
>
> My concern is that the control CPU might spin there for a millisecond
> with interrupts disabled, which is not really desired especially not on
> RT systems.
>
> Something like the untested below delta patch should work.

Good step forward! We can also reduce remote cacheline invalidation by
putting 'seq' into the cacheline after 'cpu_ts' by reordering:

struct watchdog_cpu_data {
atomic_t remote_inprogress;
struct clocksource *cs;
enum wd_result result;
u64 cpu_ts[2];
call_single_data_t csd;
atomic_t seq; /* Keep in second cacheline to elide
unnecessary invalidation */
};

and reordering the inner loop:

for (int seq = local + 2; seq < WATCHDOG_REMOTE_MAX_SEQ; seq += 2) {
if (!watchdog_wait_seq(wd, start, seq))
return;

/* Capture local timestamp before possible non-local
coherency overhead */
now = cs->read(cs);

/* Store local timestamp before reading remote to limit
coherency stalls */
wd->cpu_ts[local] = now;
prev = wd->cpu_ts[remote];
delta = (now - prev) & cs->mask;

if (delta > cs->max_raw_delta) {
watchdog_set_result(wd, WD_CPU_SKEWED);

With that said, with your latest change on the 1920 thread setup,
WATCHDOG_READOUT_MAX_US 1000 is still needed to avoid timeouts during
the previous adverse workload, however some timeouts are still seen
during massive parallel process teardowns.

To limit overhead, perhaps it is sufficient to set the timeout to
100us, avoid retries (as the hardware thread may continue to be busy
and will be rechecked later anyway), and log timeouts at the debug
level if at all.

Thanks,
Dan
--
Daniel J Blueman