Re: "mm: account nr_isolated_xxx in [isolate|putback]_lru_page" breaks OOM with swap

From: Minchan Kim
Date: Thu Aug 01 2019 - 02:51:25 EST


On Wed, Jul 31, 2019 at 02:18:00PM -0400, Qian Cai wrote:
> On Wed, 2019-07-31 at 12:09 -0400, Qian Cai wrote:
> > On Wed, 2019-07-31 at 14:34 +0900, Minchan Kim wrote:
> > > On Tue, Jul 30, 2019 at 12:25:28PM -0400, Qian Cai wrote:
> > > > OOM workloads with swapping is unable to recover with linux-next since
> > > > next-
> > > > 20190729 due to the commit "mm: account nr_isolated_xxx in
> > > > [isolate|putback]_lru_page" breaks OOM with swap" [1]
> > > >
> > > > [1] https://lore.kernel.org/linux-mm/20190726023435.214162-4-minchan@kerne
> > > > l.
> > > > org/
> > > > T/#mdcd03bcb4746f2f23e6f508c205943726aee8355
> > > >
> > > > For example, LTP oom01 test case is stuck for hours, while it finishes in
> > > > a
> > > > few
> > > > minutes here after reverted the above commit. Sometimes, it prints those
> > > > message
> > > > while hanging.
> > > >
> > > > [  509.983393][  T711] INFO: task oom01:5331 blocked for more than 122
> > > > seconds.
> > > > [  509.983431][  T711]       Not tainted 5.3.0-rc2-next-20190730 #7
> > > > [  509.983447][  T711] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > > > disables this message.
> > > > [  509.983477][  T711] oom01           D24656  5331   5157 0x00040000
> > > > [  509.983513][  T711] Call Trace:
> > > > [  509.983538][  T711] [c00020037d00f880] [0000000000000008] 0x8
> > > > (unreliable)
> > > > [  509.983583][  T711] [c00020037d00fa60] [c000000000023724]
> > > > __switch_to+0x3a4/0x520
> > > > [  509.983615][  T711] [c00020037d00fad0] [c0000000008d17bc]
> > > > __schedule+0x2fc/0x950
> > > > [  509.983647][  T711] [c00020037d00fba0] [c0000000008d1e68]
> > > > schedule+0x58/0x150
> > > > [  509.983684][  T711] [c00020037d00fbd0] [c0000000008d7614]
> > > > rwsem_down_read_slowpath+0x4b4/0x630
> > > > [  509.983727][  T711] [c00020037d00fc90] [c0000000008d7dfc]
> > > > down_read+0x12c/0x240
> > > > [  509.983758][  T711] [c00020037d00fd20] [c00000000005fb28]
> > > > __do_page_fault+0x6f8/0xee0
> > > > [  509.983801][  T711] [c00020037d00fe20] [c00000000000a364]
> > > > handle_page_fault+0x18/0x38
> > >
> > > Thanks for the testing! No surprise the patch make some bugs because
> > > it's rather tricky.
> > >
> > > Could you test this patch?
> >
> > It does help the situation a bit, but the recover speed is still way slower
> > than
> > just reverting the commit "mm: account nr_isolated_xxx in
> > [isolate|putback]_lru_page". For example, on this powerpc system, it used to
> > take 4-min to finish oom01 while now still take 13-min.
> >
> > The oom02 (testing NUMA mempolicy) takes even longer and I gave up after 26-
> > min
> > with several hang tasks below.
>
> Also, oom02 is stuck on an x86 machine.

Yeb, above my patch had a bug to test page type after page was freed.
However, after the review, I found other bugs but I don't think it's
related to your problem, either. Okay, then, let's revert the patch.

Andrew, could you revert the below patch?
"mm: account nr_isolated_xxx in [isolate|putback]_lru_page"

It's just clean up patch and isn't related to new madvise hint system call now.
Thus, it shouldn't be blocker.

Anyway, I want to fix the problem when I have available time.
Qian, What's the your config and system configuration on x86?
Is it possible to reproduce in qemu?
It would be really helpful if you tell me reproduce step on x86.

Thanks.