Re: [PATCH 1/2] mm,vmscan: Kill global shrinker lock.

From: Tetsuo Handa
Date: Thu Jan 25 2018 - 22:13:39 EST


Eric Wheeler wrote:
> Hi Tetsuo,
>
> Thank you for looking into this!
>
> I tried running this C program in 4.14.15 but did not get a deadlock, just
> OOM kills. Is the patch required to induce the deadlock?

This reproducer must not trigger actual deadlock. Running this reproducer
with this patch applied causes lockdep warning. I just tried to suggest
possibility that making shrink_slab() suddenly no-op might cause unexpected
results. We still don't know what is happening in your case.

>
> Also, what are you doing to XFS to make it trigger?

Nothing.



Would you answer to Michal's questions

Is this a permanent state or does the holder eventually releases the lock?

Do you remember the last good kernel?

and my guess

Since commit 0bcac06f27d75285 was not backported to 4.14-stable kernel,
this is unlikely the bug introduced by 0bcac06f27d75285 unless Eric
explicitly backported 0bcac06f27d75285.

?

Can you take SysRq-t (e.g. "echo t > /proc/sysrq-trigger") when processes
got stuck? I think that we need to know what other threads are doing when
__lock_page() is waiting in order to distinguish "somebody forgot to unlock
the page" and "somebody is still doing something (e.g. waiting for memory
allocation) in order to unlock the page".

If you can take SysRq-t, taking SysRq-t with
http://lkml.kernel.org/r/1510833448-19918-1-git-send-email-penguin-kernel@xxxxxxxxxxxxxxxxxxx
applied and built with CONFIG_DEBUG_SHOW_MEMALLOC_LINE=y should give us
more clues (e.g. how long threads are waiting for memory allocation).