Re: [PATCH 0/3] OOM detection rework v4

From: Michal Hocko
Date: Thu Feb 25 2016 - 04:23:25 EST


On Wed 24-02-16 19:47:06, Hugh Dickins wrote:
[...]
> Boot with mem=1G (or boot your usual way, and do something to occupy
> most of the memory: I think /proc/sys/vm/nr_hugepages provides a great
> way to gobble up most of the memory, though it's not how I've done it).
>
> Make sure you have swap: 2G is more than enough. Copy the v4.5-rc5
> kernel source tree into a tmpfs: size=2G is more than enough.
> make defconfig there, then make -j20.
>
> On a v4.5-rc5 kernel that builds fine, on mmotm it is soon OOM-killed.
>
> Except that you'll probably need to fiddle around with that j20,
> it's true for my laptop but not for my workstation. j20 just happens
> to be what I've had there for years, that I now see breaking down
> (I can lower to j6 to proceed, perhaps could go a bit higher,
> but it still doesn't exercise swap very much).
>
> This OOM detection rework significantly lowers the number of jobs
> which can be run in parallel without being OOM-killed.

This all smells like pre mature OOM because of a high order allocation
(order-2 for fork) which Tetuo has seen already. Sergey Senozhatsky is
reporting order-2 OOMs as well. It is true that what we have in the
mmomt right now is quite fragile if all order-N+ are completely
depleted. That was the case for both Tetsuo and Sergey. I have tried to
mitigate this at least to some degree by
http://lkml.kernel.org/r/20160204133905.GB14425@xxxxxxxxxxxxxx (below
with the full changelog) but I haven't heard back whether it helped
so I haven't posted the official patch yet.

I also suspect that something is not quite right with the compaction and
it gives up too early even though we have quite a lot reclaimable pages.
I do not have any numbers for that because I didn't have a load to
reproduce this problem yet. I will try your setup and see what I can do
about that. It would be great if you could give the patch below a try
and see if it helps.
---