Re: [patch 5/5] clocksource: Rewrite watchdog code completely

Next message: Peter Zijlstra: "Re: [RFC PATCH 1/7] jump_label: expose queueing API for batched static key updates"
Previous message: Antheas Kapenekakis: "Re: [RFC v3 3/4] platform/x86/amd: dptc: Add platform profile support"
Next in thread: Thomas Gleixner: "Re: [patch 5/5] clocksource: Rewrite watchdog code completely"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: Thomas Gleixner

Date: Sun Mar 08 2026 - 05:53:46 EST

Daniel!

On Mon, Feb 23 2026 at 14:53, Thomas Gleixner wrote:
> On Sun, Feb 15 2026 at 20:18, Daniel J Blueman wrote:
>> On Mon, 2 Feb 2026 at 19:27, Thomas Gleixner <tglx@xxxxxxxxxx> wrote:
>> Good step forward! We can also reduce remote cacheline invalidation by
>> putting 'seq' into the cacheline after 'cpu_ts' by reordering:
>
> Good point.
>
>> With that said, with your latest change on the 1920 thread setup,
>> WATCHDOG_READOUT_MAX_US 1000 is still needed to avoid timeouts during
>> the previous adverse workload, however some timeouts are still seen
>> during massive parallel process teardowns.
>>
>> To limit overhead, perhaps it is sufficient to set the timeout to
>> 100us, avoid retries (as the hardware thread may continue to be busy
>> and will be rechecked later anyway), and log timeouts at the debug
>> level if at all.
>
> Something like the below should work even with 50us. I left the print at
> INFO level for now. We can either change it to pr_info_once() or to
> debug as you said.

Any chance you can give this a test ride on that 1920 thread
monstrosity?

Thanks,

tglx