[PATCH V3 6/8] x86/topology: Fix max_siblings calculation
From: Zhang Rui
Date: Thu Sep 22 2022 - 09:36:06 EST
The max siblings value returned by CPUID.1F SMT level EBX differs among
CPUs on Intel Hybrid platforms like ADL-S/P.
It returns 2 for Pcore CPUs which have HT sibling and 1 for Ecore CPUs
which do not.
Today, CPUID SMT level EBX sets the global variable smp_num_siblings.
Thus, smp_num_siblings is overridden to different values based on the CPU
Pcore/Ecore enumeration order.
For example,
[ 0.201005] detect_extended_topology: CPU APICID 0x0, smp_num_siblings 2, x86_max_cores 10
[ 0.201117] start_kernel->check_bugs->cpu_smt_check_topology: smp_num_siblings 2
...
[ 0.010146] detect_extended_topology: CPU APICID 0x8, smp_num_siblings 2, x86_max_cores 10
...
[ 0.010146] detect_extended_topology: CPU APICID 0x39, smp_num_siblings 2, x86_max_cores 10
[ 0.010146] detect_extended_topology: CPU APICID 0x48, smp_num_siblings 1, x86_max_cores 20
...
[ 0.010146] detect_extended_topology: CPU APICID 0x4e, smp_num_siblings 1, x86_max_cores 20
[ 2.583800] sched_set_itmt_core_prio: smp_num_siblings 1
This inconsistency brings several potential issues:
1. some kernel configuration like cpu_smt_control, as set in
start_kernel()->check_bugs()->cpu_smt_check_topology(), depends on
smp_num_siblings set by cpu0.
It is pure luck that all the current hybrid platforms use Pcore as cpu0
and hide this problem.
2. some per CPU data like cpuinfo_x86.x86_max_cores that depends on
smp_num_siblings becomes inconsistent and bogus.
3. the final smp_num_siblings value after boot depends on the last CPU
enumerated, which could either be Pcore or Ecore CPU.
The solution is to use CPUID EAX bits_shift to get the maximum number of
addressable logical processors, and use this to determin max siblings.
Because:
1. the CPUID EAX bits_shift values are consistent among CPUs as far as
observed.
2. some code already uses smp_num_siblings value to isolate the SMT ID
bits in APIC-ID, like apic_id_is_primary_thread().
Suggested-by: Len Brown <len.brown@xxxxxxxxx>
Reviewed-by: Len Brown <len.brown@xxxxxxxxx>
Signed-off-by: Zhang Rui <rui.zhang@xxxxxxxxx>
---
arch/x86/kernel/cpu/topology.c | 17 +++++++++++------
1 file changed, 11 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kernel/cpu/topology.c b/arch/x86/kernel/cpu/topology.c
index 5e868b62a7c4..2a88f2fa5756 100644
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -23,7 +23,12 @@
#define LEAFB_SUBTYPE(ecx) (((ecx) >> 8) & 0xff)
#define BITS_SHIFT_NEXT_LEVEL(eax) ((eax) & 0x1f)
-#define LEVEL_MAX_SIBLINGS(ebx) ((ebx) & 0xffff)
+
+/*
+ * Use EAX bit_shift to calculate the maximum number of addressable logical
+ * processors sharing the current level.
+ */
+#define LEVEL_MAX_SIBLINGS(eax) (1 << BITS_SHIFT_NEXT_LEVEL(eax))
unsigned int __max_die_per_package __read_mostly = 1;
EXPORT_SYMBOL(__max_die_per_package);
@@ -79,7 +84,7 @@ int detect_extended_topology_early(struct cpuinfo_x86 *c)
* initial apic id, which also represents 32-bit extended x2apic id.
*/
c->initial_apicid = edx;
- smp_num_siblings = LEVEL_MAX_SIBLINGS(ebx);
+ smp_num_siblings = LEVEL_MAX_SIBLINGS(eax);
#endif
return 0;
}
@@ -109,9 +114,9 @@ int detect_extended_topology(struct cpuinfo_x86 *c)
*/
cpuid_count(leaf, SMT_LEVEL, &eax, &ebx, &ecx, &edx);
c->initial_apicid = edx;
- core_level_siblings = smp_num_siblings = LEVEL_MAX_SIBLINGS(ebx);
+ core_level_siblings = smp_num_siblings = LEVEL_MAX_SIBLINGS(eax);
core_plus_mask_width = ht_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
- die_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
+ die_level_siblings = LEVEL_MAX_SIBLINGS(eax);
pkg_mask_width = die_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
sub_index = 1;
@@ -122,14 +127,14 @@ int detect_extended_topology(struct cpuinfo_x86 *c)
* Check for the Core type in the implemented sub leaves.
*/
if (LEAFB_SUBTYPE(ecx) == CORE_TYPE) {
- core_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
+ core_level_siblings = LEVEL_MAX_SIBLINGS(eax);
core_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
die_level_siblings = core_level_siblings;
die_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
}
if (LEAFB_SUBTYPE(ecx) == DIE_TYPE) {
die_level_present = true;
- die_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
+ die_level_siblings = LEVEL_MAX_SIBLINGS(eax);
die_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
}
--
2.25.1