Re: [PATCH RFC 3/3] arm64: Add HOTPLUG_PARALLEL support for secondary CPUs
From: Will Deacon
Date: Tue Jun 23 2026 - 10:35:05 EST
On Mon, Jun 22, 2026 at 05:16:30PM +0800, Jinjie Ruan wrote:
>
>
> On 6/18/2026 8:21 PM, Will Deacon wrote:
> > Hi Jinjie,
> >
> > On Mon, Jun 15, 2026 at 04:51:48PM +0800, Jinjie Ruan wrote:
> >> On 6/12/2026 11:45 PM, Michael Kelley wrote:
> >>> From: Jinjie Ruan <ruanjinjie@xxxxxxxxxx> Sent: Thursday, June 11, 2026 6:38 AM
> >>>>
> >>>> Support for parallel secondary CPU bringup is already utilized by x86,
> >>>> MIPS, and RISC-V. This patch brings this capability to the arm64
> >>>> architecture.
> >>>>
> >>>> Rework the global `secondary_data` accessed during early boot into
> >>>> a per-CPU array. This array maps logical CPU IDs to MPIDR_EL1 values,
> >>>> enabling the early boot code in head.S to resolve each secondary CPU's
> >>>> logical ID concurrently.
> >>>>
> >>>> To fully enable HOTPLUG_PARALLEL, this patch implements:
> >>>> 1) An arm64-specific arch_cpuhp_kick_ap_alive() handler.
> >>>> 2) Callbacks to cpuhp_ap_sync_alive() inside secondary_start_kernel().
> >>>>
> >>>> Successfully tested on QEMU ARM64 virt machine (KVM on, 128 vCPUs).
> >>>>
> >>>> | test kernel | secondary CPUs boot time |
> >>>> | --------------------- | -------------------- |
> >>>> | Without this patch | 155.672 |
> >>>> | cpuhp.parallel=0 | 62.897 |
> >>>> | cpuhp.parallel=1 | 166.703 |
> >>>
> >>> The last two rows seem mixed up. I would expect parallel=0 to
> >>> result in a longer boot time.
> >>
> >> Hi, Michael,
> >>
> >> The results are correct and not mixed up.
> >>
> >> Compared to the original non‑HOTPLUG_PARALLEL approach, the advantage of
> >> cpuhp.parallel=0 lies in its use of cpu_relax(`yield` on arm64) instead
> >> of the wait_for_completion_timeout() mechanism (which may cause sleep
> >> and context switching). This significantly reduces the overhead of VM
> >> exits and context switches in a KVM guest, thereby cutting the secondary
> >> CPU boot time by more than half.
> >
> > I don't think that's a particularly compelling reason to enable this for
> > arm64, in all honesty. The yield instruction typically doesn't do
> > anything on actual arm64 silicon, so this probably means that you're
> > introducing busy-loops which tend to be bad for power and scalability.
> >
> > I implemented this a while ago [1] but didn't manage to see much in terms
> > of performance improvement and so I didn't bother to send the patches out
> > after talking about it at KVM forum [2]. However, as mentioned at the end
> > of that talk, it _is_ still useful for confidential VMs using PSCI so
> > let me dust off my old series and send it out to see what you think.
>
> Hi Will,
>
> Thanks for the insights! Your point about using PSCI v0.2's Context ID
> to avoid the NR_CPUS array for input parameters (like
> secondary_data.task) is incredibly elegant.
>
> However, if I understand your series correctly, it seems your approach
> primarily targets preventing the concurrent use of secondary_data.task,
> but it doesn't seem to account for the potential data trampling on
> secondary_data.status when multiple secondary CPUs are brought up
> simultaneously.
>
> update_cpu_boot_status()
> -> WRITE_ONCE(secondary_data.status.flags[val], 1)
>
> arch_cpuhp_cleanup_kick_cpu()
> -> status = READ_ONCE(secondary_data.status)
I need to dust it back off but IIRC I made that thing a byte array, with
a separate byte for each failure reason.
Will