[PATCH RFC] x86/smpboot: Set safer __max_logical_packages limit

From: Vitaly Kuznetsov
Date: Thu Apr 20 2017 - 09:25:08 EST


Recent changes in logical package management (Commit 9d85eb9119f4
("x86/smpboot: Make logical package management more robust") and its
predecessor) caused boot failures for some Xen guests. E.g. I'm trying to
boot 10 CPU guest on AMD Opteron 4284 system and I see the following crash:

[ 0.116104] smpboot: Max logical packages: 1
...
[ 0.590068] #8
[ 0.001000] smpboot: Package 1 of CPU 8 exceeds BIOS package data 1.
[ 0.001000] ------------[ cut here ]------------
[ 0.001000] kernel BUG at arch/x86/kernel/cpu/common.c:1020!

This is happening because total_cpus is 10 and x86_max_cores is 16(!).
Turns out, the number of CPUs (vCPUs in our case) in each logical package
doesn't have to be exactly x86_max_cores, we can have any number of CPUs
<= x86_max_cores and they also don't have to match for all logical
packages. This breaks the current concept of __max_logical_packages.

In this patch I suggest we set __max_logical_packages based on the
max_physical_pkg_id and total_cpus, this should be safe and cover all
possible cases. Alternatively, we may think about eliminating the concept
of __max_logical_packages completely and relying on max_physical_pkg_id/
total_cpus where we currently use topology_max_packages().

The issue could've been solved in Xen too I guess. CPUID returning
x86_max_cores can be tweaked to be the lowerest(?) possible number of
all logical packages of the guest.

Fixes: 9d85eb9119f4 ("x86/smpboot: Make logical package management more robust")
Signed-off-by: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx>
---
arch/x86/kernel/smpboot.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index bd1f1ad..85f41cd 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -359,7 +359,6 @@ static void __init smp_init_package_map(struct cpuinfo_x86 *c, unsigned int cpu)
ncpus = 1;
}

- __max_logical_packages = DIV_ROUND_UP(total_cpus, ncpus);
logical_packages = 0;

/*
@@ -367,6 +366,15 @@ static void __init smp_init_package_map(struct cpuinfo_x86 *c, unsigned int cpu)
* package can be smaller than the actual used apic ids.
*/
max_physical_pkg_id = DIV_ROUND_UP(MAX_LOCAL_APIC, ncpus);
+
+ /*
+ * Each logical package has not more than x86_max_cores CPUs but
+ * it can happen that it has less, e.g. we may have 1 CPU per logical
+ * package regardless of what's in x86_max_cores. This is seen on some
+ * Xen setups with AMD processors.
+ */
+ __max_logical_packages = min(max_physical_pkg_id, total_cpus);
+
size = max_physical_pkg_id * sizeof(unsigned int);
physical_to_logical_pkg = kmalloc(size, GFP_KERNEL);
memset(physical_to_logical_pkg, 0xff, size);
--
2.9.3