Re: [PATCH RFC 3/3] arm64: Add HOTPLUG_PARALLEL support for secondary CPUs

From: Jinjie Ruan

Date: Mon Jun 15 2026 - 05:58:11 EST




On 6/12/2026 11:45 PM, Michael Kelley wrote:
> From: Jinjie Ruan <ruanjinjie@xxxxxxxxxx> Sent: Thursday, June 11, 2026 6:38 AM
>>
>> Support for parallel secondary CPU bringup is already utilized by x86,
>> MIPS, and RISC-V. This patch brings this capability to the arm64
>> architecture.
>>
>> Rework the global `secondary_data` accessed during early boot into
>> a per-CPU array. This array maps logical CPU IDs to MPIDR_EL1 values,
>> enabling the early boot code in head.S to resolve each secondary CPU's
>> logical ID concurrently.
>>
>> To fully enable HOTPLUG_PARALLEL, this patch implements:
>> 1) An arm64-specific arch_cpuhp_kick_ap_alive() handler.
>> 2) Callbacks to cpuhp_ap_sync_alive() inside secondary_start_kernel().
>>
>> Successfully tested on QEMU ARM64 virt machine (KVM on, 128 vCPUs).
>>
>> | test kernel | secondary CPUs boot time |
>> | --------------------- | -------------------- |
>> | Without this patch | 155.672 |
>> | cpuhp.parallel=0 | 62.897 |
>> | cpuhp.parallel=1 | 166.703 |
>
> The last two rows seem mixed up. I would expect parallel=0 to
> result in a longer boot time.

Without this patch:

KVM event statistics (6 entries)
Event name Samples Sample% Time (ns) Time%
Mean Time (ns)
DABT_LOW 323112 75.00% 1669148000 17.00%
5165
WFx 85817 19.00% 723215800 7.00%
8427
SYS64 14914 3.00% 419934530 4.00%
28157
IRQ 5643 1.00% 6732439250 70.00%
1193060
HVC64 282 0.00% 35543970 0.00%
126042
IABT_LOW 1 0.00% 6130 0.00%
6130

cpuhp.parallel=0:

Event name Samples Sample% Time (ns) Time%
Mean Time (ns)
DABT_LOW 308175 80.00% 643628050 6.00%
2088
WFx 55208 14.00% 261925270 2.00%
4744
SYS64 14975 3.00% 155727880 1.00%
10399
IRQ 4755 1.00% 8496162210 88.00%
1786784
HVC64 280 0.00% 19429900 0.00%
69392
IABT_LOW 1 0.00% 5850 0.00%
5850

cpuhp.parallel=1:

Event name Samples Sample% Time (ns) Time%
Mean Time (ns)
DABT_LOW 307923 77.00% 692965050 2.00%
2250
WFx 59549 15.00% 287888960 0.00%
4834
SYS64 15127 3.00% 334366230 1.00%
22103
IRQ 12861 3.00% 29784004970 95.00%
2315838
HVC64 280 0.00% 21869940 0.00%
78106
IABT_LOW 1 0.00% 9320 0.00%
9320

- Default (no patch): Slowest HVC64 handling (126 μs), highest WFx count
(85k), and most total VM‑exits.

- cpuhp.parallel=1: HVC64 latency improved to 78 μs (close to
cpuhp.parallel=0), but IRQ exits increased dramatically (12.9k, 2.7×
that of `cpuhp.parallel=0`), accounting for 95% of event time and
becoming the new bottleneck.

- cpuhp.parallel=0: Fastest HVC64 (69 μs), lowest IRQ exits (4.8k), and
lowest total samples, delivering the best overall boot performance.

Therefor, `cpuhp.parallel=1` reduces HVC cost but suffers from a massive
increase in IRQ exits, while `cpuhp.parallel=0` avoids this interrupt
storm and therefore performs best in a KVM guest.

>
> Michael
>
>>
>> Signed-off-by: Jinjie Ruan <ruanjinjie@xxxxxxxxxx>
>> ---
>> arch/arm64/Kconfig | 1 +
>> arch/arm64/include/asm/smp.h | 8 ++++++++
>> arch/arm64/kernel/head.S | 23 +++++++++++++++++++++++
>> arch/arm64/kernel/smp.c | 27 +++++++++++++++++++++++++++
>> 4 files changed, 59 insertions(+)
>>
>
>