Re: mm: deadlock between get_online_cpus/pcpu_alloc

From: Christoph Lameter
Date: Wed Feb 08 2017 - 22:16:32 EST


On Wed, 8 Feb 2017, Thomas Gleixner wrote:

> There is a world outside yours. Hotplug is actually used frequently for
> power purposes in some scenarios.

The usual case does not inolve hotplug.

> It will improve nothing. The stop machine context is extremly limited and
> you cannot do complex things there at all. Not to talk about the inability
> of taking a simple mutex which would immediately deadlock the machine.

You do not need to do complex things. Basically flipping some cpu mask
bits will do it. stop machine ensures that code is not
executing on the processors when the bits are flipped. That will ensure
that there is no need to do any get_online_cpu() nastiness in critical VM
paths since we are guaranteed not to be executing them.

> And everything complex needs to be done _before_ that in normal
> context. Hot unplug already uses stop machine for the final removal of the
> outgoing CPU, but that's definitely not the place where you can do anything
> complex like page management.

If it already does that then why do we still need get_online_cpu()? We do
not do anything like page management. Why would we? We just need to ensure
that nothing is executing when the bits are flipped. If that is the case
then the get_online_cpu(0 calls are unecessary because the bit flipping
simply cannot occur in these functions. There is nothing to serialize
against.

> If you can prepare the outgoing cpu work during the cpu offline phase and
> then just flip a bit in the stop machine part, then this might work, but
> anything else is just handwaving and proliferation of wet dreams.

Fine with that.