Re: [patch 5/9] x86: Cure per CPU madness on UP

From: Guenter Roeck
Date: Thu Mar 21 2024 - 10:07:03 EST


On 3/21/24 04:14, Thomas Gleixner wrote:
On Wed, Mar 20 2024 at 08:46, Guenter Roeck wrote:
On 3/20/24 01:58, Thomas Gleixner wrote:
On Fri, Mar 15 2024 at 09:17, Guenter Roeck wrote:
I don't know the code well enough to determine what is wrong.
Please let me know what I can do to help debugging the problem.

Could you provide me the config and the qemu command line?


defconfig-CONFIG_SMP and

qemu-system-x86_64 -kernel arch/x86/boot/bzImage -cpu Haswell \
--append "console=ttyS0" -nographic -monitor none

The cpu doesn't really matter as long as it is an Intel CPU.
A root file system isn't needed since the boot doesn't get that far.

Now it get's interesting because I can't reproduce it with that setup at
all.

What's weird is that I saw it exactly once on 64-bit in a VM with a UP
config two days ago, but when I started to add instrumentation it never
came back even after backing the instrumentation changes out. I have
seriously no idea what's going on there.

Is it fully reproducible on your side?


Yes, always.

If so can you please provide a full dmesg and then apply the patch below
and provide the resulting full dmesg too?


You'll find everything at http://server.roeck-us.net/qemu/x86-nosmp/

The crash is gone after applying your patch. The difference is:

+ /*
+ * If there was no APIC registered, then the map check below would
+ * fail. With no APIC this is guaranteed to be an UP system and
+ * therefore all topology levels have only one entry and their
+ * logical ID is obviously 0.
+ */
+ if (topo_info.boot_cpu_apic_id == BAD_APICID) {
+ pr_info("#### topo_info.boot_cpu_apic_id == BAD_APICID\n");
^^^^ I added this
+ return 0;
+ }
+

I see the "#### topo_info.boot_cpu_apic_id == BAD_APICID" message
twice in the log. See patched.log at the page pointed to above.

Hope the helps,
Guenter