Re: [RFC PATCH] memcg, oom: throttle dump_header for memcg ooms without eligible tasks
From: Michal Hocko
Date: Tue Oct 16 2018 - 05:20:49 EST
On Tue 16-10-18 09:55:06, Tetsuo Handa wrote:
> On 2018/10/15 22:35, Michal Hocko wrote:
> >> Nobody can prove that it never kills some machine. This is just one example result of
> >> one example stress tried in my environment. Since I am secure programming man from security
> >> subsystem, I really hate your "Can you trigger it?" resistance. Since this is OOM path
> >> where nobody tests, starting from being prepared for the worst case keeps things simple.
> >
> > There is simply no way to be generally safe this kind of situation. As
> > soon as your console is so slow that you cannot push the oom report
> > through there is only one single option left and that is to disable the
> > oom report altogether. And that might be a viable option.
>
> There is a way to be safe this kind of situation. The way is to make sure that printk()
> is called with enough interval. That is, count the interval between the end of previous
> printk() messages and the beginning of next printk() messages.
You are simply wrong. Because any interval is meaningless without
knowing the printk throughput.
[...]
> lines on evey page fault event. A kernel which consumes multiple milliseconds on each page
> fault event (due to printk() messages from the defunctional OOM killer) is stupid.
Not if it represent an unusual situation where there is no eligible
task available. Because this is an exceptional case where the cost of
the printk is simply not relevant.
[...]
I am sorry to skip large part of your message but this discussion, like
many others, doesn't lead anywhere. You simply refuse to understand
some of the core assumptions in this area.
> Anyway, I'm OK if we apply _BOTH_ your patch and my patch. Or I'm OK with simplified
> one shown below (because you don't like per memcg limit).
My patch is adding a rate-limit! I really fail to see why we need yet
another one on top of it. This is just ridiculous. I can see reasons to
tune that rate limit but adding 2 different mechanisms is just wrong.
If your NAK to unify the ratelimit for dump_header for all paths
still holds then I do not care too much to push it forward. But I find
thiis way of the review feedback counter productive.
--
Michal Hocko
SUSE Labs