Re: [PATCH v2] x86/tsc: Extend watchdog check exemption to 4-Sockets platform

From: Feng Tang
Date: Thu Oct 13 2022 - 21:16:08 EST


On Fri, Oct 14, 2022 at 08:37:18AM +0800, Feng Tang wrote:
> On Thu, Oct 13, 2022 at 09:02:43AM -0700, Dave Hansen wrote:
> > On 10/13/22 06:12, Feng Tang wrote:
> > > @@ -1217,7 +1217,7 @@ static void __init check_system_tsc_reliable(void)
> > > if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
> > > boot_cpu_has(X86_FEATURE_NONSTOP_TSC) &&
> > > boot_cpu_has(X86_FEATURE_TSC_ADJUST) &&
> > > - nr_online_nodes <= 2)
> > > + nr_online_nodes <= 4)
> > > tsc_disable_clocksource_watchdog();
> >
> > I still don't think we should perpetuate this hack.
> >
> > This just plain doesn't work in numa=off numa=fake=... or presumably in
> > cases where NUMA is disabled in the firmware and memory is interleaved
> > across all sockets.
> >
> > It also presumably doesn't work on two-socket systems that have
> > Cluster-on-Die or Sub-NUMA-Clustering where a single socket is chopped
> > up into multiple nodes.
>
> Yes, after you raised the 'nr_online_nodes' issue, Peter, Rui and I
> have discussed the problem, and plan to post a RFC patch as in
> https://lore.kernel.org/lkml/Y0UgeUIJSFNR4mQB@feng-clx/
>
> Which can cover:
> - numa=fake=... case
> - platform has DRAM nodes and cpu-less HBM/PMEM nodes
>
> and 'sub-numa-clustering' can't be covered, and the tsc will be
> watchdoged as before.
[...]


> For numa=off case, there is only one CPU up, and I think lifting this
> watchdog for tsc is fine.

Sorry, I was wrong about this. 'numa=off' will still boot all CPUs up,
but skip SRAT table init and only show one node. so this is another
case that the fix patch can't cover.

Thanks,
Feng