Re: [RFC PATCH] mm: silence soft lockups from unlock_page

From: Qian Cai
Date: Tue Jul 21 2020 - 09:23:57 EST


On Tue, Jul 21, 2020 at 02:17:52PM +0200, Michal Hocko wrote:
> On Tue 21-07-20 07:44:07, Qian Cai wrote:
> >
> >
> > > On Jul 21, 2020, at 7:25 AM, Michal Hocko <mhocko@xxxxxxxxxx> wrote:
> > >
> > > Are these really important? I believe I can dig that out from the bug
> > > report but I didn't really consider that important enough.
> >
> > Please dig them out. We have also been running those things on
> > âlargeâ powerpc as well and never saw such soft-lockups. Those
> > details may give us some clues about the actual problem.
>
> I strongly suspect this is not really relevant but just FYI this is
> 16Node, 11.9TB with 1536CPUs system.

Okay, we are now talking about the HPC special case. Just brain-storming some
ideas here.


1) What about increase the soft-lockup threshold early at boot and restore
afterwards? As far as I can tell, those soft-lockups are just a few bursts of
things and then cure itself after the booting.

2) Reading through the comments above page_waitqueue(), it said rare hash
collisions could happen, so sounds like in this HPC case, it is rather easy to
hit those hash collisons. Thus, need to deal with that instead?

3) The commit 62906027091f ("mm: add PageWaiters indicating tasks are waiting
for a page bit") mentioned that,

"Putting two bits in the same word opens the opportunity to remove the memory
barrier between clearing the lock bit and testing the waiters bit, after some
work on the arch primitives (e.g., ensuring memory operand widths match and
cover both bits)."

Do you happen to know if this only happen on powerpc? Also, probably need to
dig out if those memory barrier is still there that could be removed to speed
up things.

>
> > Once we
> > understand the problem better, we may judge if this âhackâ is
> > really worth it.
>
> I do not have access to the machine so I can only judge from the boot
> log I have in hands. And from that it is pretty clear that
> $ grep BUG tmp/attachment.txt | wc -l
> 896
>
> $ grep BUG tmp/attachment.txt | grep "\[systemd" | wc -l
> 860
>
> $ grep do_fault+0x448 tmp/attachment.txt | wc -l
> 860
>
> that the boot struggles, lockups happen from udev workers and most of
> them are stuck at the very same place which is unlock_page. The rest is
> a part of the changelog.
> --
> Michal Hocko
> SUSE Labs