Re: [PATCH 0/3] make vm_committed_as_batch aware of vm overcommit policy

From: Feng Tang
Date: Wed May 27 2020 - 09:33:51 EST


Hi Qian,

On Wed, May 27, 2020 at 08:05:49AM -0400, Qian Cai wrote:
> On Wed, May 27, 2020 at 06:46:06PM +0800, Feng Tang wrote:
> > Hi Qian,
> >
> > On Tue, May 26, 2020 at 10:25:39PM -0400, Qian Cai wrote:
> > > > > > > [1] https://lkml.org/lkml/2020/3/5/57
> > > > > >
> > > > > > Reverted this series fixed a warning under memory pressue.
> > > > >
> > > > > Andrew, Stephen, can you drop this series?
> > > > >
> > > > > >
> > > > > > [ 3319.257898] LTP: starting oom01
> > > > > > [ 3319.284417] ------------[ cut here ]------------
> > > > > > [ 3319.284439] memory commitment underflow
> > > >
> > > > Thanks for the catch!
> > > >
> > > > Could you share the info about the platform, like the CPU numbers
> > > > and RAM size, and what's the mmap test size of your test program.
> > > > It would be great if you can point me the link to the test program.
> > >
> > > I have been reproduced this on both AMD and Intel. The test just
> > > allocating memory and swapping.
> > >
> > > https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/mem/oom/oom01.c
> > > https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/mem/tunable/overcommit_memory.c
> > >
> > > It might be better to run the whole LTP mm tests if none of the above
> > > triggers it for you which has quite a few memory pressurers.
> > >
> > > /opt/ltp/runltp -f mm
> >
> > Thanks for sharing. I tried to reproduce this on 2 server plaforms,
> > but can't reproduce it, and they are still under testing.
> >
> > Meanwhile, could you help to try the below patch, which is based on
> > Andi's suggestion and have some debug info. The warning is a little
> > strange, as the condition is
> >
> > (percpu_counter_read(&vm_committed_as) <
> > -(s64)vm_committed_as_batch * num_online_cpus())
> >
> > while for your platform (48 CPU + 128 GB RAM), the
> > '-(s64)vm_committed_as_batch * num_online_cpus()'
> > is a s64 value: '-32G', which makes the condition hard to be true,
> > and when it is, it could be triggered by some magic for s32/s64
> > operations around the percpu-counter.
>
> Here is the information on AMD and powerpc below affected by this. It
> could need a bit patient to reproduce, but our usual daily CI would
> trigger it eventually after a few tries.
>
> # git clone https://github.com/cailca/linux-mm.git
> # cd linux-mm
> # ./compile.sh
> # systemctl reboot
> # ./test.sh

I just downloaded it, and it failed on my desktop machine as it failed
in 'yum' and 'grub2' setup. The difficulty for me to reproduce is the
test platforms are behind the 0day framework, and I can hardly setup
external test suits, though I have been trying for all day today :)

So if possible, please help to try the patch in my last email. thanks!

- Feng