Re: mm: deadlock between get_online_cpus/pcpu_alloc

From: Michal Hocko
Date: Tue Feb 07 2017 - 11:41:39 EST


On Tue 07-02-17 16:22:24, Mel Gorman wrote:
> On Tue, Feb 07, 2017 at 04:34:59PM +0100, Michal Hocko wrote:
> > > But we do not care about the whole cpu hotplug code. The only part we
> > > really do care about is the race inside drain_pages_zone and that will
> > > run in an atomic context on the specific CPU.
> > >
> > > You are absolutely right that using the mutex is safe as well but the
> > > hotplug path is already littered with locks and adding one more to the
> > > picture doesn't sound great to me. So I would really like to not use a
> > > lock if that is possible and safe (with a big fat comment of course).
> >
> > And with the full changelog. I hope I haven't missed anything this time.
> > ---
> > From 8c6af3116520251cc4ec2213f0a4ed2544bb4365 Mon Sep 17 00:00:00 2001
> > From: Michal Hocko <mhocko@xxxxxxxx>
> > Date: Tue, 7 Feb 2017 16:08:35 +0100
> > Subject: [PATCH] mm, page_alloc: do not depend on cpu hotplug locks inside the
> > allocator
> >
> > <SNIP>
> >
> > Reported-by: Dmitry Vyukov <dvyukov@xxxxxxxxxx>
> > Signed-off-by: Michal Hocko <mhocko@xxxxxxxx>
>
> Not that I can think of. It's almost identical to the diff I posted with
> the exception of the mutex in the cpu hotplug teardown path. I agree that
> in the current implementation that it should be unnecessary even if I
> thought it would be more robust against any other hotplug churn.

I am always nervous when seeing hotplug locks being used in low level
code. It has bitten us several times already and those deadlocks are
quite hard to spot when reviewing the code and very rare to hit so they
tend to live for a long time.

> Acked-by: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx>

Thanks! I will wait for Tejun to confirm my assumptions are correct and
post the patch to Andrew if there are no further problems spotted. Btw.
this will also get rid of another lockdep report which seem to be false
possitive though
http://lkml.kernel.org/r/20170203145548.GC19325@xxxxxxxxxxxxxx

--
Michal Hocko
SUSE Labs