Re: mm: deadlock between get_online_cpus/pcpu_alloc

From: Christoph Lameter
Date: Wed Feb 08 2017 - 11:20:01 EST

On Wed, 8 Feb 2017, Michal Hocko wrote:

> I have no idea what you are trying to say and how this is related to the
> deadlock we are discussing here. We certainly do not need to add
> stop_machine the problem. And yeah, dropping get_online_cpus was
> possible after considering all fallouts.

This is not the first time get_online_cpus() causes problems due to the
need to support hotplug for processors. Hotplugging is not happening
frequently (which is low balling it. Actually the frequency of the hotplug
events on almost all systems is zero) so the constant check is a useless
overhead and causes trouble for development. In particular
get_online_cpus() is often needed in sections that need to hold locks.

So lets get rid of it. The severity, frequency and rarity of processor
hotplug events would justify only allowing adding and removal of
processors through the stop_machine_xx mechanism. With that in place the
processor masks can be used without synchronization and the locking issues
all over the kernel would become simpler.

It is likely that this will even improve the hotplug code because the
easier form of synchronization (you have a piece of code that executed
while the OS is in stop state) would allow to make more significant
changes to the software environment. F.e. one could think about removing
memory segments as well as maybe per cpu segments.