Re: [PATCH 1/7] stop_machine: Introduce stop_machine_nmi()

From: Borislav Petkov

Date: Wed Mar 04 2026 - 11:39:32 EST


On Thu, Feb 05, 2026 at 06:14:39PM -0800, Chang S. Bae wrote:
> On 2/2/2026 2:54 AM, Borislav Petkov wrote:
> ...
> > @@ -174,8 +174,26 @@ struct multi_stop_data {
> > enum multi_stop_state state;
> > atomic_t thread_ack;
> > +
> > + bool use_nmi;
> > +
> > + /*
> > + * cpumasks of CPUs on which to raise an NMI; used in the NMI
> > + * stomp_machine variant. nmi_cpus_done is used for tracking
> > + * when the NMI handler has executed successfully.
> > + */
> > + struct cpumask nmi_cpus;
> > + struct cpumask nmi_cpus_done;
> > +
> > +};
>
> Looks like every stop_machine variant then will spend stack for these masks.
> It seems they could be cpumask_var_t.

I guess...

> Alternatively, to make it simple further, a per-CPU variable could achieve
> this if I understand correctly:
>
> struct stop_machine_nmi_ctrl {
> ...
> bool done;
> }

The first mask - nmi_cpus guards from the NMI handler running again. The
second one checks whether all CPUs ran the NMI handler.

I guess simply checking whether the nmi_cpus mask is *not* empty, would tell
us that too so we probably are fine with a single mask only.

> I don't know whether that was an intentional design choice or not. But, at
> least the NMI variant might have a slight different semantic in this regard.

This current behavior doesn't make a whole lot of sense to me - at least from
what I'm reading. I think it is clearly better if the caller gets told when
some NMI handler failed instead of overwriting an error val.

But maybe we'll fix that while we're at it.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette