Re: [PATCH] net/skbuff: silence warnings under memory pressure

From: Vlastimil Babka
Date: Mon Sep 02 2019 - 10:24:50 EST


On 8/30/19 5:25 PM, Qian Cai wrote:
> On Fri, 2019-08-30 at 17:11 +0200, Eric Dumazet wrote:
>>
>> On 8/30/19 4:57 PM, Qian Cai wrote:
>>> When running heavy memory pressure workloads, the system is throwing
>>> endless warnings below due to the allocation could fail from
>>> __build_skb(), and the volume of this call could be huge which may
>>> generate a lot of serial console output and cosumes all CPUs as
>>> warn_alloc() could be expensive by calling dump_stack() and then
>>> show_mem().
>>>
>>> Fix it by silencing the warning in this call site. Also, it seems
>>> unnecessary to even print a warning at all if the allocation failed in
>>> __build_skb(), as it may just retransmit the packet and retry.
>>>

Well, __GFP_NOWARN would save me from explaining this warning to users
many times. OTOH usually it's an indication that min_free_kbytes should
be raised to better cope with network traffic.

>>
>> Same patches are showing up there and there from time to time.
>>
>> Why is this particular spot interesting, against all others not adding
>> __GFP_NOWARN ?

This one is interesting that it's a GFP_ATOMIC allocation triggered by
incoming packets, and has a fallback mechanism. I don't recall other so
notoric ones.

>> Are we going to have hundred of patches adding __GFP_NOWARN at various points,
>> or should we get something generic to not flood the syslog in case of memory
>> pressure ?
>>
>
> From my testing which uses LTP oom* tests. There are only 3 places need to be
> patched. The other two are in IOMMU code for both Intel and AMD. The place is
> particular interesting because it could cause the system with floating serial
> console output for days without making progress in OOM. I suppose it ends up in
> a looping condition that warn_alloc() would end up generating more calls into
> __build_skb() via ksoftirqd.

Regardless of this particular allocation, if the reporting itself makes
the conditions so much worse, then at least some kind of general
ratelimit would make sense indeed.