Re: [PATCH] net/skbuff: silence warnings under memory pressure

From: Qian Cai
Date: Wed Sep 04 2019 - 07:59:22 EST


On Wed, 2019-09-04 at 10:25 +0200, Michal Hocko wrote:
> On Wed 04-09-19 16:00:42, Sergey Senozhatsky wrote:
> > On (09/04/19 15:41), Sergey Senozhatsky wrote:
> > > But the thing is different in case of dump_stack() + show_mem() +
> > > some other output. Because now we ratelimit not a single printk() line,
> > > but hundreds of them. The ratelimit becomes - 10 * $$$ lines in 5 seconds
> > > (IOW, now we talk about thousands of lines).
> >
> > And on devices with slow serial consoles this can be somewhat close to
> > "no ratelimit". *Suppose* that warn_alloc() adds 700 lines each time.
> > Within 5 seconds we can call warn_alloc() 10 times, which will add 7000
> > lines to the logbuf. If printk() can evict only 6000 lines in 5 seconds
> > then we have a growing number of pending logbuf messages.
>
> Yes, ratelimit is problematic when the ratelimited operation is slow. I
> guess that is a well known problem and we would need to rework both the
> api and the implementation to make it work in those cases as well.
> Essentially we need to make the ratelimit act as a gatekeeper to an
> operation section - something like a critical section except you can
> tolerate more code executions but not too many. So effectively
>
> start_throttle(rate, number);
> /* here goes your operation */
> end_throttle();
>
> one operation is not considered done until the whole section ends.
> Or something along those lines.
>
> In this particular case we can increase the rate limit parameters of
> course but I think that longterm we need a better api.

The problem is when a system is under heavy memory pressure, everything is
becoming slower, so I don't know how to come up with a sane default for rate
limit parameters as a generic solution that would work for every machine out
there. Sure, it is possible to set a limit as low as possible that would work
for the majority of systems apart from people may complain that they are now
missing important warnings, but using __GFP_NOWARN in this code would work for
all systems. You could even argument there is even a separate benefit that it
could reduce the noise-level overall from those build_skb() allocation failures
as it has a fall-back mechanism anyway.