Re: [patch 5/5] clocksource: Rewrite watchdog code completely

From: Daniel J Blueman

Date: Sun Mar 15 2026 - 10:59:50 EST

On Mon, 23 Feb 2026 at 21:53, Thomas Gleixner <tglx@xxxxxxxxxx> wrote:
>
> On Sun, Feb 15 2026 at 20:18, Daniel J Blueman wrote:
> > On Mon, 2 Feb 2026 at 19:27, Thomas Gleixner <tglx@xxxxxxxxxx> wrote:
> > Good step forward! We can also reduce remote cacheline invalidation by
> > putting 'seq' into the cacheline after 'cpu_ts' by reordering:
>
> Good point.
>
> > With that said, with your latest change on the 1920 thread setup,
> > WATCHDOG_READOUT_MAX_US 1000 is still needed to avoid timeouts during
> > the previous adverse workload, however some timeouts are still seen
> > during massive parallel process teardowns.
> >
> > To limit overhead, perhaps it is sufficient to set the timeout to
> > 100us, avoid retries (as the hardware thread may continue to be busy
> > and will be rechecked later anyway), and log timeouts at the debug
> > level if at all.
>
> Something like the below should work even with 50us. I left the print at
> INFO level for now. We can either change it to pr_info_once() or to
> debug as you said.

My apologies for the delays!

Comparing the execution time of the existing mainline
clocksource_watchdog() to the proposed approach, there isn't
significant additional overhead [1], which is excellent.

With that said, on the 16 socket (1920 thread) setup, we see most
remote calls end up timing out with WATCHDOG_READOUT_MAX_US at 50,
leading to excessive logging. pr_info_once() would be a good approach
to avoid the spam, however I still feel we should use a higher
(250-500us?) timeout to keep the mechanism effective.

I also feel if a remote hardware thread is seen to timeout, retrying
has a high likelyhood of timing out also, so it may be cheaper in the
bigger picture to not retry. Sensitivity could be increased by walking
threads in socket order (S0T0 ... S15T0 S0T1 ... S15T1 ...). These two
items are my only concerns.

Thanks and great work,
Dan

-- [1]

# bpftrace -e 'kprobe:clocksource_watchdog { @start[tid] = nsecs; }
kretprobe:clocksource_watchdog /@start[tid]/ {
@lat_us = hist((nsecs - @start[tid]) / 1000);
delete(@start[tid]);}'

Mainline idle:
[16, 32) 2.9% |@ |
[32, 64) 96.0% |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[64, 128) 0.8% | |
[128, 256) 0.4% | |

Proposed idle:
[32, 64) 0.0% | |
[64, 128) 0.1% | |
[128, 256) 99.6% |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[256, 512) 0.2% | |

Mainline loaded:
[8, 16) 0.2% | |
[16, 32) 82.7% |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[32, 64) 17.0% |@@@@@@@@@@ |
[64, 128) 0.1% | |
[128, 256) 0.0% | |

Proposed loaded:
[16, 32) 6.2% |@@@ |
[32, 64) 93.8% |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[64, 128) 0.0% | |
--
Daniel J Blueman