Re: mm: deadlock between get_online_cpus/pcpu_alloc

From: Michal Hocko
Date: Tue Feb 07 2017 - 09:19:18 EST


On Tue 07-02-17 13:58:46, Mel Gorman wrote:
> On Tue, Feb 07, 2017 at 01:37:08PM +0100, Michal Hocko wrote:
[...]
> > Anyway, shouldn't be it sufficient to disable preemption
> > on drain_local_pages_wq?
>
> That would be sufficient for a hot-removed CPU moving the drain request
> to another CPU and avoiding any scheduling events.
>
> > The CPU hotplug callback will not preempt us
> > and so we cannot work on the same cpus, right?
> >
>
> I don't see a specific guarantee that it cannot be preempted and it
> would depend on an the exact cpu hotplug implementation which is subject
> to quite a lot of change.

But we do not care about the whole cpu hotplug code. The only part we
really do care about is the race inside drain_pages_zone and that will
run in an atomic context on the specific CPU.

You are absolutely right that using the mutex is safe as well but the
hotplug path is already littered with locks and adding one more to the
picture doesn't sound great to me. So I would really like to not use a
lock if that is possible and safe (with a big fat comment of course).

--
Michal Hocko
SUSE Labs