Re: [PATCH V2] mm/gup: Clear the LRU flag of a page before adding to LRU batch

From: Chris Li
Date: Sun Aug 04 2024 - 13:52:54 EST


On Sun, Aug 4, 2024 at 5:22 AM Kairui Song <ryncsn@xxxxxxxxx> wrote:
>
> > Hi Yu, I tested your patch, on my system, the OOM still exists (96
> > core and 256G RAM), test memcg is limited to 512M and 32 thread ().
> >
> > And I found the OOM seems irrelevant to either your patch or Ge's
> > patch. (it may changed the OOM chance slight though)
> >
> > After the very quick OOM (it failed to untar the linux source code),
> > checking lru_gen_full:
> > memcg 47 /build-kernel-tmpfs
> > node 0
> > 442 1691 29405 0
> > 0 0r 0e 0p 57r
> > 617e 0p
> > 1 0r 0e 0p 0r
> > 4e 0p
> > 2 0r 0e 0p 0r
> > 0e 0p
> > 3 0r 0e 0p 0r
> > 0e 0p
> > 0 0 0 0
> > 0 0
> > 443 1683 57748 832
> > 0 0 0 0 0
> > 0 0
> > 1 0 0 0 0
> > 0 0
> > 2 0 0 0 0
> > 0 0
> > 3 0 0 0 0
> > 0 0
> > 0 0 0 0
> > 0 0
> > 444 1670 30207 133
> > 0 0 0 0 0
> > 0 0
> > 1 0 0 0 0
> > 0 0
> > 2 0 0 0 0
> > 0 0
> > 3 0 0 0 0
> > 0 0
> > 0 0 0 0
> > 0 0
> > 445 1662 0 0
> > 0 0R 34T 0 57R
> > 238T 0
> > 1 0R 0T 0 0R
> > 0T 0
> > 2 0R 0T 0 0R
> > 0T 0
> > 3 0R 0T 0 0R
> > 81T 0
> > 13807L 324O 867Y 2538N
> > 63F 18A
> >
> > If I repeat the test many times, it may succeed by chance, but the
> > untar process is very slow and generates about 7000 generations.
> >
> > But if I change the untar cmdline to:
> > python -c "import sys; sys.stdout.buffer.write(open('$linux_src',
> > mode='rb').read())" | tar zx
> >
> > Then the problem is gone, it can untar the file successfully and very fast.
> >
> > This might be a different issue reported by Chris, I'm not sure.
>
> After more testing, I think these are two problems (note I changed the
> memcg limit to 600m later so the compile test can run smoothly).
>
> 1. OOM during the untar progress (can be workarounded by the untar
> cmdline I mentioned above).

There are two different issues here.
My recent test script has moved the untar phase out of memcg limit
(mostly I want to multithreading untar) so the bisect I did is only
catch the second one.
The untar issue might not be a regression from this patch.

> 2. OOM during the compile progress (this should be the one Chris encountered).
>
> Both 1 and 2 only exist for MGLRU.
> 1 can be workarounded using the cmdline I mentioned above.
> 2 is caused by Ge's patch, and 1 is not.
>
> I can confirm Yu's patch fixed 2 on my system, but the 1 seems still a
> problem, it's not related to this patch, maybe can be discussed
> elsewhere.

I will do a test run now with Yu's patch and report back.

Chris