Re: [PATCH] x86/tsc: Extend the watchdog check exemption to 4S/8S machine
From: Feng Tang
Date: Mon Oct 10 2022 - 21:09:44 EST
On Mon, Oct 10, 2022 at 07:23:10AM -0700, Dave Hansen wrote:
> On 10/9/22 18:23, Feng Tang wrote:
> >>> diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> >>> index cafacb2e58cc..b4ea79cb1d1a 100644
> >>> --- a/arch/x86/kernel/tsc.c
> >>> +++ b/arch/x86/kernel/tsc.c
> >>> @@ -1217,7 +1217,7 @@ static void __init check_system_tsc_reliable(void)
> >>> if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
> >>> boot_cpu_has(X86_FEATURE_NONSTOP_TSC) &&
> >>> boot_cpu_has(X86_FEATURE_TSC_ADJUST) &&
> >>> - nr_online_nodes <= 2)
> >>> + nr_online_nodes <= 8)
> >> So you're saying all 8 socket systems since Broadwell (?) are TSC
> >> sync'ed ?
> > No, I didn't mean that. I haven't got chance to any 8 sockets
> > machine, and I got a report last month that on one 8S machine,
> > the TSC was judged 'unstable' by HPET as watchdog.
>
> That's not a great check. Think about numa=fake=4U, for instance. Or a
> single-socket system with persistent memory and high bandwidth memory.
>
> Basically 'nr_online_nodes' is a software construct. It's going to be
> really hard to infer anything from it about what the _hardware_ is.
You are right! How to get the socket number was indeed a trouble when
I worked on commit b50db7095fe0, the problem is related to the
initialization order. This tsc check needs to be done in tsc_init(),
while the node_stats[] get initialized in later's call of smp_init().
For the case you mentioned above, I dug out some old logs which showed
its init order:
numa=fake=4 on a SKL desktop
================
[ 0.000066] [tsc_early_init()]: nr_online_nodes = 1
[ 0.000068] [tsc_early_init()]: nr_cpu_nodes = 0
[ 0.000070] [tsc_early_init()]: nr_mem_nodes = 0
[ 0.104015] [tsc_init()]: nr_online_nodes = 4
[ 0.104019] [tsc_init()]: nr_cpu_nodes = 0
[ 0.104022] [tsc_init()]: nr_mem_nodes = 4
[ 0.124778] smp: Brought up 4 nodes, 4 CPUs
[ 0.760915] [init_tsc_clocksource()]: nr_online_nodes = 4
[ 0.760919] [init_tsc_clocksource()]: nr_cpu_nodes = 4
[ 0.760922] [init_tsc_clocksource()]: nr_mem_nodes = 4
QEMU with 2 CPU-DRAM nodes + 2 Persistent memory nodes
========================================================
[ 0.066651] [tsc_early_init()]: nr_online_nodes = 1
[ 0.067494] [tsc_early_init()]: nr_cpu_nodes = 0
[ 0.068288] [tsc_early_init()]: nr_mem_nodes = 0
[ 0.677694] [tsc_init()]: nr_online_nodes = 4
[ 0.678862] [tsc_init()]: nr_cpu_nodes = 0
[ 0.679962] [tsc_init()]: nr_mem_nodes = 4
[ 1.139240] [init_tsc_clocksource()]: nr_online_nodes = 4
[ 1.140576] [init_tsc_clocksource()]: nr_cpu_nodes = 2
[ 1.141823] [init_tsc_clocksource()]: nr_mem_nodes = 4
[ 1.660100] [kernel_init()]: nr_online_nodes = 4
[ 1.661234] [kernel_init()]: nr_cpu_nodes = 2
[ 1.662300] [kernel_init()]: nr_mem_nodes = 4
The 'nr_online_nodes' was chosed in the hope of that, in worse case
the patch is just a nop and won't wrongly lift the check.
One possible solution for this problem is to leverage the SRAT table
early init which is called before tsc_init(), and can provide CPU
nodes info. Will try this way.
Thanks,
Feng