Re: [patch 28/30] x86/microcode: Handle "offline" CPUs correctly

From: Ashok Raj
Date: Thu Aug 10 2023 - 17:49:48 EST


On Thu, Aug 10, 2023 at 11:05:11PM +0200, Peter Zijlstra wrote:
> On Thu, Aug 10, 2023 at 01:50:17PM -0700, Ashok Raj wrote:
> > On Thu, Aug 10, 2023 at 10:46:05PM +0200, Peter Zijlstra wrote:
> > > On Thu, Aug 10, 2023 at 08:38:07PM +0200, Thomas Gleixner wrote:
> > >
> > > > for_each_cpu_and(cpu, cpu_present_mask, &cpus_booted_once_mask) {
> > > > + /*
> > > > + * Offline CPUs sit in one of the play_dead() functions
> > > > + * with interrupts disabled, but they still react on NMIs
> > > > + * and execute arbitrary code. Also MWAIT being updated
> > > > + * while the offline CPU sits there is not necessarily safe
> > > > + * on all CPU variants.
> > > > + *
> > > > + * Mark them in the offline_cpus mask which will be handled
> > > > + * by CPU0 later in the update process.
> > > > + *
> > > > + * Ensure that the primary thread is online so that it is
> > > > + * guaranteed that all cores are updated.
> > > > + */
> > > > if (!cpu_online(cpu)) {
> > > > + if (topology_is_primary_thread(cpu) || !allow_smt_offline) {
> > > > + pr_err("CPU %u not online, loading aborted\n", cpu);
> > >
> > > We could make the NMI handler do the ucode load, no? Also, you just need
> > > any thread online, don't particularly care about primary thread or not
> > > afaict.
> >
> > Patch 25 does that load in NMI. You are right, we just need "a" CPU in each
> > core online.
>
> Patch 25 does it for online CPUs, offline CPUs are having a separate
> code path:
>
> microcode_nmi_handler()
>
> vs
>
> microcode_offline_nmi_handler()

Since the code enforces all primary CPUs to be ONLINE, the secondaries are the
other thread of the same core. So they automatically get the update when
primary does it.

The secondaries are parked in NMI just to avoid the risk of executing code
that might be patched by primary.

Or maybe you had something else in mind.