Re: [PATCH 1/7] stop_machine: Introduce stop_machine_nmi()

From: Chang S. Bae

Date: Thu Feb 05 2026 - 21:15:54 EST


On 2/2/2026 2:54 AM, Borislav Petkov wrote:
...
@@ -174,8 +174,26 @@ struct multi_stop_data {
enum multi_stop_state state;
atomic_t thread_ack;
+
+ bool use_nmi;
+
+ /*
+ * cpumasks of CPUs on which to raise an NMI; used in the NMI
+ * stomp_machine variant. nmi_cpus_done is used for tracking
+ * when the NMI handler has executed successfully.
+ */
+ struct cpumask nmi_cpus;
+ struct cpumask nmi_cpus_done;
+
+};

Looks like every stop_machine variant then will spend stack for these masks. It seems they could be cpumask_var_t.

Alternatively, to make it simple further, a per-CPU variable could achieve this if I understand correctly:

struct stop_machine_nmi_ctrl {
...
bool done;
}

Then,

for_each_cpu(cpu, cpus)
per_cpu(stop_machine_nmi_ctrl.done, cpu) = false;

...
ret = stop_cpus(cpu_online_mask, multi_cpu_stop, &msdata);
...

for_each_cpu(cpu, cpus) {
if (!per_cpu(stop_machine_nmi_ctrl.done, cpu)) {
pr_err("Some CPUs didn't run ...\n");
return -EINVAL;
}
}

Or, I guess even the msdata pointer alone could do that too -- setting it before and checking if NULL after.

+static int __stop_machine_cpuslocked(cpu_stop_fn_t fn, void *data,
+ const struct cpumask *cpus, bool use_nmi)
{
struct multi_stop_data msdata = {
.fn = fn,
.data = data,
.num_threads = num_online_cpus(),
.active_cpus = cpus,
+ .use_nmi = use_nmi,
};
+ int ret, cpu;
+
+ if (use_nmi) {
+ cpumask_copy(&msdata.nmi_cpus, cpus);
+ cpumask_clear(&msdata.nmi_cpus_done);
+ }
lockdep_assert_cpus_held();
@@ -617,7 +677,32 @@ int stop_machine_cpuslocked(cpu_stop_fn_t fn, void *data,
/* Set the initial state and stop all online cpus. */
set_state(&msdata, MULTI_STOP_PREPARE);
- return stop_cpus(cpu_online_mask, multi_cpu_stop, &msdata);
+ ret = stop_cpus(cpu_online_mask, multi_cpu_stop, &msdata);
+
+ if (!use_nmi)
+ return ret;
+
+ if (!cpumask_equal(cpus, &msdata.nmi_cpus_done)) {
+ pr_err("Some CPUs didn't run the stomp_machine NMI handler\n");
+ return -EINVAL;
+ } else {
+ for_each_cpu(cpu, cpus)
+ ret |= per_cpu(stop_machine_nmi_ctrl.err, cpu);

This error accumulation here makes sense to me though,

Currently, stop_machine() documents:

/*
...
* Return: 0 if all invocations of @fn return zero. Otherwise, the
* value returned by an arbitrarily chosen member of the set of calls to
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* @fn that returned non-zero.
*/

This behavior dates back to:

commit 1142d810298e694 ("cpu_stop: implement stop_cpu[s]()")

where cpu_stopper_thread() does:

struct cpu_stop_done *done = work->done;
...
ret = fn(arg);
if (ret)
done->ret = ret;

So the return value is overwritten instead of accumulation, while struct cpu_stop_done is shared like the note:

/*
* Structure to determine completion condition and record errors. May
* be shared by works on different cpus.
*/

I don't know whether that was an intentional design choice or not. But, at least the NMI variant might have a slight different semantic in this regard.