Re: [PATCH] net: add big honking pfmemalloc OOM warning

From: Juha-Matti Tilli
Date: Thu Apr 11 2019 - 02:51:35 EST


On Wed, Apr 10, 2019 at 10:11 PM David Miller <davem@xxxxxxxxxxxxx> wrote:
> > SNMP counters are per netns, and more useful in the modern computing
> > era, where a host is shared by many different containers.
>
> +1 There is no way I am applying this patch.
>
> The kernel should not "big honking" anything in the logs.

Just to check, is the opposition to the patch related to the
expectation that it will log the condition too often despite the rate
limit, if many packets are dropped? Because if it is, that might be
possible to fix.

I think it might be possible to check the SNMP counter value, and if
zero, log the first instance of pfmemalloc drop, and then omit logging
afterwards. There could be race conditions, so in the absolute worst
case, you could have let's say 2 or 3 of these log lines instead of 1,
but I don't see that as an issue, because 99% of the time there would
be just one, and 2 or 3 lines won't fill the logs.

In our case, the existence of such a log message and the helpful
suggestion to bump up vm.min_free_kbytes would have saved us
approximately one month of debugging (or 2-3 weeks if the SNMP counter
was there in this kernel version). Even one such log message would be
enough. Our production systems were hanging daily during this
debugging happening.

In my opinion, the ideal count of pfmemalloc drops is exactly 0, and
the interesting event is the first instance of pfmemalloc drop
occurring.

If there's a bug in the kernel, I think the user should be notified,
so I see this as similar to some WARN_ON line -- which is even more
"big honking" log event because it's associated with a backtrace.

BR, Juha-Matti