Re: [PATCH] use x86 cpu park to speedup smp_init in kexec situation

From: David Woodhouse
Date: Thu Jan 21 2021 - 10:45:22 EST


On Thu, 2021-01-21 at 15:55 +0100, Thomas Gleixner wrote:
> > Testing on real hardware has been more interesting and less useful so
> > far. We started with the CPUHP_BRINGUP_KICK_CPU state being
> > *immediately* before CPUHP_BRINGUP_CPU. On my 28-thread Haswell box,
> > that didn't come up at all even without actually *doing* anything in
> > the pre-bringup phase. Merely bringing all the AP threads up through
> > the various CPUHP_PREPARE_foo stages before actually bringing them
> > online, was enough to break it. I have no serial port on this box so we
> > haven't get worked out why; I've resorted to putting the
> > CPUHP_BRINGUP_KICK_CPU state before CPUHP_WORKQUEUE_PREP instead.
>
> Hrm.

Aha, I managed to reproduce in qemu. It's CPUHP_X2APIC_PREPARE, which
is only used in x2apic *cluster* mode not physical mode. So I actually
need to give the guest an IOMMU with IRQ remapping before I see it.


$ git diff
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index bc56287a1ed1..f503e66b4718 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -92,6 +92,7 @@ enum cpuhp_state {
CPUHP_MIPS_SOC_PREPARE,
CPUHP_BP_PREPARE_DYN,
CPUHP_BP_PREPARE_DYN_END = CPUHP_BP_PREPARE_DYN + 20,
+ CPUHP_BRINGUP_WAKE_CPU,
CPUHP_BRINGUP_CPU,
CPUHP_AP_IDLE_DEAD,
CPUHP_AP_OFFLINE,
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 2b8d7a5db383..6c6f2986bfdb 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1336,6 +1336,12 @@ void bringup_nonboot_cpus(unsigned int setup_max_cpus)
{
unsigned int cpu;

+ for_each_present_cpu(cpu) {
+ if (num_online_cpus() >= setup_max_cpus)
+ break;
+ if (!cpu_online(cpu))
+ cpu_up(cpu, CPUHP_BRINGUP_WAKE_CPU);
+ }
for_each_present_cpu(cpu) {
if (num_online_cpus() >= setup_max_cpus)
break;
$ qemu-system-x86_64 -kernel arch/x86/boot/bzImage -append "console=ttyS0 trace_event=cpuhp tp_printk" -display none -serial mon:stdio -m 2G -M q35,accel=kvm,kernel-irqchip=split -device intel-iommu,intremap=on -smp 40
...
[ 0.349968] smp: Bringing up secondary CPUs ...
[ 0.350281] cpuhp_enter: cpu: 0001 target: 42 step: 1 (smpboot_create_threads)
[ 0.351421] cpuhp_exit: cpu: 0001 state: 1 step: 1 ret: 0
[ 0.352074] cpuhp_enter: cpu: 0001 target: 42 step: 2 (perf_event_init_cpu)
[ 0.352276] cpuhp_exit: cpu: 0001 state: 2 step: 2 ret: 0
[ 0.353273] cpuhp_enter: cpu: 0001 target: 42 step: 37 (workqueue_prepare_cpu)
[ 0.354377] cpuhp_exit: cpu: 0001 state: 37 step: 37 ret: 0
[ 0.355273] cpuhp_enter: cpu: 0001 target: 42 step: 39 (hrtimers_prepare_cpu)
[ 0.356271] cpuhp_exit: cpu: 0001 state: 39 step: 39 ret: 0
[ 0.356937] cpuhp_enter: cpu: 0001 target: 42 step: 41 (x2apic_prepare_cpu)
[ 0.357277] cpuhp_exit: cpu: 0001 state: 41 step: 41 ret: 0
[ 0.358278] cpuhp_enter: cpu: 0002 target: 42 step: 1 (smpboot_create_threads)
...
[ 0.614278] cpuhp_enter: cpu: 0032 target: 42 step: 1 (smpboot_create_threads)
[ 0.615610] cpuhp_exit: cpu: 0032 state: 1 step: 1 ret: 0
[ 0.616274] cpuhp_enter: cpu: 0032 target: 42 step: 2 (perf_event_init_cpu)
[ 0.617271] cpuhp_exit: cpu: 0032 state: 2 step: 2 ret: 0
[ 0.618272] cpuhp_enter: cpu: 0032 target: 42 step: 37 (workqueue_prepare_cpu)
[ 0.619388] cpuhp_exit: cpu: 0032 state: 37 step: 37 ret: 0
[ 0.620273] cpuhp_enter: cpu: 0032 target: 42 step: 39 (hrtimers_prepare_cpu)
[ 0.621270] cpuhp_exit: cpu: 0032 state: 39 step: 39 ret: 0
[ 0.622009] cpuhp_enter: cpu: 0032 target: 42 step: 41 (x2apic_prepare_cpu)
[ 0.622275] cpuhp_exit: cpu: 0032 state: 41 step: 41 ret: 0
...
[ 0.684272] cpuhp_enter: cpu: 0039 target: 42 step: 41 (x2apic_prepare_cpu)
[ 0.685277] cpuhp_exit: cpu: 0039 state: 41 step: 41 ret: 0
[ 0.685979] cpuhp_enter: cpu: 0001 target: 217 step: 43 (smpcfd_prepare_cpu)
[ 0.686283] cpuhp_exit: cpu: 0001 state: 43 step: 43 ret: 0
[ 0.687274] cpuhp_enter: cpu: 0001 target: 217 step: 44 (relay_prepare_cpu)
[ 0.688274] cpuhp_exit: cpu: 0001 state: 44 step: 44 ret: 0
[ 0.689274] cpuhp_enter: cpu: 0001 target: 217 step: 47 (rcutree_prepare_cpu)
[ 0.690271] cpuhp_exit: cpu: 0001 state: 47 step: 47 ret: 0
[ 0.690982] cpuhp_multi_enter: cpu: 0001 target: 217 step: 59 (trace_rb_cpu_prepare)
[ 0.691281] cpuhp_exit: cpu: 0001 state: 59 step: 59 ret: 0
[ 0.692272] cpuhp_multi_enter: cpu: 0001 target: 217 step: 59 (trace_rb_cpu_prepare)
[ 0.694640] cpuhp_exit: cpu: 0001 state: 59 step: 59 ret: 0
[ 0.695272] cpuhp_multi_enter: cpu: 0001 target: 217 step: 59 (trace_rb_cpu_prepare)
[ 0.696280] cpuhp_exit: cpu: 0001 state: 59 step: 59 ret: 0
[ 0.697279] cpuhp_enter: cpu: 0001 target: 217 step: 65 (timers_prepare_cpu)
[ 0.698168] cpuhp_exit: cpu: 0001 state: 65 step: 65 ret: 0
[ 0.698272] cpuhp_enter: cpu: 0001 target: 217 step: 67 (kvmclock_setup_percpu)
[ 0.699270] cpuhp_exit: cpu: 0001 state: 67 step: 67 ret: 0
[ 0.700272] cpuhp_enter: cpu: 0001 target: 217 step: 88 (bringup_cpu)
[ 0.701312] x86: Booting SMP configuration:
[ 0.702270] .... node #0, CPUs: #1
[ 0.127218] kvm-clock: cpu 1, msr 59401041, secondary cpu clock
[ 0.127218] smpboot: CPU 1 Converting physical 0 to logical die 1
[ 0.709281] cpuhp_enter: cpu: 0001 target: 217 step: 147 (smpboot_unpark_threads)
[ 0.712294] cpuhp_exit: cpu: 0001 state: 147 step: 147 ret: 0
[ 0.714283] cpuhp_enter: cpu: 0001 target: 217 step: 149 (irq_affinity_online_cpu)
[ 0.717292] cpuhp_exit: cpu: 0001 state: 149 step: 149 ret: 0
[ 0.719283] cpuhp_enter: cpu: 0001 target: 217 step: 153 (perf_event_init_cpu)
[ 0.721279] cpuhp_exit: cpu: 0001 state: 153 step: 153 ret: 0
[ 0.724285] cpuhp_enter: cpu: 0001 target: 217 step: 179 (lockup_detector_online_cpu)
[ 0.727279] cpuhp_exit: cpu: 0001 state: 179 step: 179 ret: 0
[ 0.729279] cpuhp_enter: cpu: 0001 target: 217 step: 180 (workqueue_online_cpu)
[ 0.731309] cpuhp_exit: cpu: 0001 state: 180 step: 180 ret: 0
[ 0.733281] cpuhp_enter: cpu: 0001 target: 217 step: 181 (rcutree_online_cpu)
[ 0.735276] cpuhp_exit: cpu: 0001 state: 181 step: 181 ret: 0
[ 0.737278] cpuhp_enter: cpu: 0001 target: 217 step: 183 (kvm_cpu_online)
[ 0.739286] kvm-guest: stealtime: cpu 1, msr 7d46c080
[ 0.740274] cpuhp_exit: cpu: 0001 state: 183 step: 183 ret: 0
[ 0.742278] cpuhp_enter: cpu: 0001 target: 217 step: 184 (page_writeback_cpu_online)
[ 0.744275] cpuhp_exit: cpu: 0001 state: 184 step: 184 ret: 0
[ 0.745277] cpuhp_enter: cpu: 0001 target: 217 step: 185 (vmstat_cpu_online)
[ 0.747276] cpuhp_exit: cpu: 0001 state: 185 step: 185 ret: 0
[ 0.749280] cpuhp_enter: cpu: 0001 target: 217 step: 216 (sched_cpu_activate)
[ 0.750275] cpuhp_exit: cpu: 0001 state: 216 step: 216 ret: 0
[ 0.752273] cpuhp_exit: cpu: 0001 state: 217 step: 88 ret: 0
[ 0.753030] cpuhp_enter: cpu: 0002 target: 217 step: 43 (smpcfd_prepare_cpu)
...
[ 2.311273] cpuhp_exit: cpu: 0031 state: 217 step: 88 ret: 0
[ 2.312278] cpuhp_enter: cpu: 0032 target: 217 step: 43 (smpcfd_prepare_cpu)
[ 2.313119] cpuhp_exit: cpu: 0032 state: 43 step: 43 ret: 0
[ 2.313277] cpuhp_enter: cpu: 0032 target: 217 step: 44 (relay_prepare_cpu)
[ 2.314275] cpuhp_exit: cpu: 0032 state: 44 step: 44 ret: 0
[ 2.315274] cpuhp_enter: cpu: 0032 target: 217 step: 47 (rcutree_prepare_cpu)
[ 2.316104] cpuhp_exit: cpu: 0032 state: 47 step: 47 ret: 0
[ 2.316273] cpuhp_multi_enter: cpu: 0032 target: 217 step: 59 (trace_rb_cpu_prepare)
[ 2.317292] cpuhp_exit: cpu: 0032 state: 59 step: 59 ret: 0
[ 2.318275] cpuhp_multi_enter: cpu: 0032 target: 217 step: 59 (trace_rb_cpu_prepare)
[ 2.320401] cpuhp_exit: cpu: 0032 state: 59 step: 59 ret: 0
[ 2.321111] cpuhp_multi_enter: cpu: 0032 target: 217 step: 59 (trace_rb_cpu_prepare)
[ 2.321286] cpuhp_exit: cpu: 0032 state: 59 step: 59 ret: 0
[ 2.322273] cpuhp_enter: cpu: 0032 target: 217 step: 65 (timers_prepare_cpu)
[ 2.323271] cpuhp_exit: cpu: 0032 state: 65 step: 65 ret: 0
[ 2.324272] cpuhp_enter: cpu: 0032 target: 217 step: 67 (kvmclock_setup_percpu)
[ 2.325133] cpuhp_exit: cpu: 0032 state: 67 step: 67 ret: 0
[ 2.325273] cpuhp_enter: cpu: 0032 target: 217 step: 88 (bringup_cpu)
[ 2.326292] #32
[ 2.289283] kvm-clock: cpu 32, msr 59401801, secondary cpu clock
[ 2.289283] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 2.289283] #PF: supervisor write access in kernel mode
[ 2.289283] #PF: error_code(0x0002) - not-present page
[ 2.289283] PGD 0 P4D 0
[ 2.289283] Oops: 0002 [#1] SMP PTI
[ 2.289283] CPU: 32 PID: 0 Comm: swapper/32 Not tainted 5.10.0+ #745
[ 2.289283] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-1.fc33 04/01/2014
[ 2.289283] RIP: 0010:init_x2apic_ldr+0xa0/0xb0
[ 2.289283] Code: 89 2d 9c 81 fb 72 65 8b 15 cd 12 fb 72 89 d2 f0 48 0f ab 50 08 5b 5d c3 48 8b 05 a3 7b 09 02 48 c7 05 98 7b 09 02 00 00 00 00 <89> 18 eb cd 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44 00 00 89
[ 2.289283] RSP: 0000:ffffb15e8016fec0 EFLAGS: 00010046
[ 2.289283] RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000040
[ 2.289283] RDX: 00000000ffffffff RSI: 0000000000000000 RDI: 0000000000000028
[ 2.289283] RBP: 0000000000018428 R08: 0000000000000000 R09: 0000000000000028
[ 2.289283] R10: ffffb15e8016fd78 R11: ffff88ca7ff28368 R12: 0000000000000200
[ 2.289283] R13: 0000000000000020 R14: 0000000000000000 R15: 0000000000000000
[ 2.289283] FS: 0000000000000000(0000) GS:ffff88ca7dc00000(0000) knlGS:0000000000000000
[ 2.289283] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2.289283] CR2: 0000000000000000 CR3: 0000000058610000 CR4: 00000000000006a0
[ 2.289283] Call Trace:
[ 2.289283] setup_local_APIC+0x88/0x320
[ 2.289283] ? printk+0x48/0x4a
[ 2.289283] apic_ap_setup+0xa/0x20
[ 2.289283] start_secondary+0x2f/0x130
[ 2.289283] secondary_startup_64_no_verify+0xc2/0xcb
[ 2.289283] Modules linked in:
[ 2.289283] CR2: 0000000000000000
[ 2.289283] ---[ end trace 676dcdbf63e55075 ]---
[ 2.289283] RIP: 0010:init_x2apic_ldr+0xa0/0xb0
[ 2.289283] Code: 89 2d 9c 81 fb 72 65 8b 15 cd 12 fb 72 89 d2 f0 48 0f ab 50 08 5b 5d c3 48 8b 05 a3 7b 09 02 48 c7 05 98 7b 09 02 00 00 00 00 <89> 18 eb cd 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44 00 00 89
[ 2.289283] RSP: 0000:ffffb15e8016fec0 EFLAGS: 00010046
[ 2.289283] RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000040
[ 2.289283] RDX: 00000000ffffffff RSI: 0000000000000000 RDI: 0000000000000028
[ 2.289283] RBP: 0000000000018428 R08: 0000000000000000 R09: 0000000000000028
[ 2.289283] R10: ffffb15e8016fd78 R11: ffff88ca7ff28368 R12: 0000000000000200
[ 2.289283] R13: 0000000000000020 R14: 0000000000000000 R15: 0000000000000000
[ 2.289283] FS: 0000000000000000(0000) GS:ffff88ca7dc00000(0000) knlGS:0000000000000000
[ 2.289283] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2.289283] CR2: 0000000000000000 CR3: 0000000058610000 CR4: 00000000000006a0
[ 2.289283] Kernel panic - not syncing: Attempted to kill the idle task!
[ 2.289283] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---

Attachment: smime.p7s
Description: S/MIME cryptographic signature