Re: [PATCH v2] smp: Document preemption and stop_machine() mutual exclusion

From: Paul E. McKenney
Date: Mon Jul 07 2025 - 11:56:11 EST


On Mon, Jul 07, 2025 at 09:50:50AM +0200, Peter Zijlstra wrote:
> On Sat, Jul 05, 2025 at 01:23:27PM -0400, Joel Fernandes wrote:
> > Recently while revising RCU's cpu online checks, there was some discussion
> > around how IPIs synchronize with hotplug.
> >
> > Add comments explaining how preemption disable creates mutual exclusion with
> > CPU hotplug's stop_machine mechanism. The key insight is that stop_machine()
> > atomically updates CPU masks and flushes IPIs with interrupts disabled, and
> > cannot proceed while any CPU (including the IPI sender) has preemption
> > disabled.
>
> I'm very conflicted on this. While the added comments aren't wrong,
> they're not quite accurate either. Stop_machine doesn't wait for people
> to enable preemption as such.
>
> Fundamentally there seems to be a misconception around what stop machine
> is and how it works, and I don't feel these comments make things better.
>
> Basically, stop-machine (and stop_one_cpu(), stop_two_cpus()) use the
> stopper task, a task running at the ultimate priority; if it is
> runnable, it will run.
>
> Stop-machine simply wakes all the stopper tasks and co-ordinates them to
> literally stop the machine. All CPUs have the stopper task scheduled and
> then they go sit in a spin-loop driven state machine with IRQs disabled.
>
> There really isn't anything magical about any of this.

There is the mechanism (which you have described above), and then there
are the use cases. Those of us maintaining a given mechanism might
argue that a detailed description of the mechanism suffices, but that
argument does not always win the day.

I do like the description in the stop_machine() kernel-doc header:

* This can be thought of as a very heavy write lock, equivalent to
* grabbing every spinlock in the kernel.

Though doesn't this need to upgrace "spinlock" to "raw spinlock"
now that PREEMPT_RT is in mainline?

Also, this function is more powerful than grabbing every write lock
in the kernel because it also excludes all regions of code that have
preemption disabled, which is one thing that CPU hotplug is relying on.
Any objection to calling out that additional semantic?

Thanx, Paul