Re: possible deadlock in lru_add_drain_all
From: Michal Hocko
Date: Tue Oct 31 2017 - 11:46:12 EST
[CC David Herrmann for shmem_wait_for_pins. The thread starts
http://lkml.kernel.org/r/089e0825eec8955c1f055c83d476@xxxxxxxxxx
with the callchains explained http://lkml.kernel.org/r/20171030151009.ip4k7nwan7muouca@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
for shmem_wait_for_pins involvement see below]
On Tue 31-10-17 16:25:32, Peter Zijlstra wrote:
> On Tue, Oct 31, 2017 at 02:13:33PM +0100, Michal Hocko wrote:
>
> > > I can indeed confirm it's running old code; cpuhp_state is no more.
> >
> > Does this mean the below chain is no longer possible with the current
> > linux-next (tip)?
>
> I see I failed to answer this; no it will happen but now reads like:
>
> s/cpuhp_state/&_up/
>
> Where we used to have a single lock protecting the hotplug stuff, we now
> have 2, one for bringing stuff up and one for tearing it down.
>
> This got rid of lock cycles that included cpu-up and cpu-down parts;
> those are false positives because we cannot do cpu-up and cpu-down
> concurrently.
>
> But this report only includes a single (cpu-up) part and therefore is
> not affected by that change other than a lock name changing.
Hmm, OK. I have quickly glanced through shmem_wait_for_pins and I fail
to see why it needs lru_add_drain_all at all. All we should care about
is the radix tree and the lru cache only cares about the proper
placement on the LRU list which is not checked here. I might be missing
something subtle though. David?
We've had some MM vs. hotplug issues. See e.g. a459eeb7b852 ("mm,
page_alloc: do not depend on cpu hotplug locks inside the allocator"),
so I suspect we might want/need to do similar for lru_add_drain_all.
It feels like I've already worked on that but for the live of mine I
cannot remember.
Anyway, this lock dependecy is subtle as hell and I am worried that we
might have way too many of those. We have so many callers of
get_online_cpus that dependecies like this are just waiting to blow up.
--
Michal Hocko
SUSE Labs