Re: [bisected] x86 boot still broken on -rc2
From: Prarit Bhargava
Date: Mon Dec 04 2017 - 11:46:08 EST
On 12/04/2017 08:13 AM, Prarit Bhargava wrote:
>
>
> x86: Booting SMP configuration:
> .... node #0, CPUs: #1 #2 #3 #4
> .... node #1, CPUs: #5 #6 #7 #8 #9
> .... node #0, CPUs: #10 #11 #12 #13 #14
> .... node #1, CPUs: #15 #16 #17 #18 #19
> smp: Brought up 2 nodes, 20 CPUs
> smpboot: Max logical packages: 1
>
> which means that the calculation of logical packages is wrong because
>
> ncpus = cpu_data(0).booted_cores * smp_num_siblings;
> ncpus = 10 * 2;
> ncpus = 20;
>
> smp_num_siblings is defined as "The number of threads in a core" which
> should be 1 if HT/SMT is disabled.
>
> It looks like my patch has exposed a bug in the
> smp_num_siblings calculation. I'm still debugging ...
The bug is that smp_num_siblings has been incorrectly calculated as the
*maximum* number of threads in a core, and not the actual number of threads in
a core on systems which have a CPUID level greater than 0xb. (see
arch/x86/kernel/cpu/topology.c:59)
That will take some time to investigate and come up with a proper solution and
fix. In the meantime, the patch below will fix the problem in the short-term.
I've tested the patch using SMT enabled, SMT disabled, maxcpus=1 and nr_cpus=1.
tglx, Please revert b4c0a7326f5d ("x86/smpboot: Fix __max_logical_packages
estimate") if you think that is a better option. The problem with
smp_num_siblings has been around for almost a decade.
P.
---8<---
Subject: [PATCH] arch/x86: Do not use smp_num_siblings in
__max_logical_packages calculation
Documentation/x86/topology.txt defines smp_num_siblings as "The number of
threads in a core". Since commit bbb65d2d365e ("x86: use cpuid vector 0xb
when available for detecting cpu topology") smp_num_siblings is the
maximum number of threads in a core. If Simultaneous MultiThreading
(SMT) is disabled on a system, smp_num_siblings is 2 and not 1 as
expected.
Use topology_max_smt_threads() in the __max_logical_packages calculation.
Signed-off-by: Prarit Bhargava <prarit@xxxxxxxxxx
Cc: Jakub Kicinski <kubakici@xxxxx>
Cc: "netdev@xxxxxxxxxxxxxxx" <netdev@xxxxxxxxxxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: Clark Williams <williams@xxxxxxxxxx>
---
arch/x86/kernel/smpboot.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 3d01df7d7cf6..eaee15fb7d8b 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1304,7 +1304,7 @@ void __init native_smp_cpus_done(unsigned int max_cpus)
* Today neither Intel nor AMD support heterogenous systems so
* extrapolate the boot cpu's data to all packages.
*/
- ncpus = cpu_data(0).booted_cores * smp_num_siblings;
+ ncpus = cpu_data(0).booted_cores * topology_max_smt_threads();
__max_logical_packages = DIV_ROUND_UP(nr_cpu_ids, ncpus);
pr_info("Max logical packages: %u\n", __max_logical_packages);
--
1.8.3.1