Re: [RFC PATCH] memcg, oom: throttle dump_header for memcg ooms without eligible tasks

From: Michal Hocko
Date: Mon Oct 15 2018 - 04:19:41 EST


On Sat 13-10-18 20:28:38, Tetsuo Handa wrote:
> On 2018/10/13 20:22, Johannes Weiner wrote:
> > On Sat, Oct 13, 2018 at 08:09:30PM +0900, Tetsuo Handa wrote:
> >> ---------- Michal's patch ----------
> >>
> >> 73133 lines (5.79MB) of kernel messages per one run
> >>
> >> [root@ccsecurity ~]# time ./a.out
> >>
> >> real 3m44.389s
> >> user 0m0.000s
> >> sys 3m42.334s
> >>
> >> [root@ccsecurity ~]# time ./a.out
> >>
> >> real 3m41.767s
> >> user 0m0.004s
> >> sys 3m39.779s
> >>
> >> ---------- My v2 patch ----------
> >>
> >> 50 lines (3.40 KB) of kernel messages per one run
> >>
> >> [root@ccsecurity ~]# time ./a.out
> >>
> >> real 0m5.227s
> >> user 0m0.000s
> >> sys 0m4.950s
> >>
> >> [root@ccsecurity ~]# time ./a.out
> >>
> >> real 0m5.249s
> >> user 0m0.000s
> >> sys 0m4.956s
> >
> > Your patch is suppressing information that I want to have and my
> > console can handle, just because your console is slow, even though
> > there is no need to use that console at that log level.
>
> My patch is not suppressing information you want to have.
> My patch is mainly suppressing
>
> [ 52.393146] Out of memory and no killable processes...
> [ 52.395195] a.out invoked oom-killer: gfp_mask=0x6000c0(GFP_KERNEL), nodemask=(null), order=0, oom_score_adj=-1000
> [ 52.398623] Out of memory and no killable processes...
> [ 52.401195] a.out invoked oom-killer: gfp_mask=0x6000c0(GFP_KERNEL), nodemask=(null), order=0, oom_score_adj=-1000
> [ 52.404356] Out of memory and no killable processes...
> [ 52.406492] a.out invoked oom-killer: gfp_mask=0x6000c0(GFP_KERNEL), nodemask=(null), order=0, oom_score_adj=-1000
> [ 52.409595] Out of memory and no killable processes...
> [ 52.411745] a.out invoked oom-killer: gfp_mask=0x6000c0(GFP_KERNEL), nodemask=(null), order=0, oom_score_adj=-1000
> [ 52.415588] Out of memory and no killable processes...
> [ 52.418484] a.out invoked oom-killer: gfp_mask=0x6000c0(GFP_KERNEL), nodemask=(null), order=0, oom_score_adj=-1000
> [ 52.421904] Out of memory and no killable processes...
> [ 52.424273] a.out invoked oom-killer: gfp_mask=0x6000c0(GFP_KERNEL), nodemask=(null), order=0, oom_score_adj=-1000
>
> lines which Michal's patch cannot suppress.

This was a deliberate decision because the allocation failure context is
usually a useful information to get. If this is killing a reasonably
configured machine then we can move the ratelimit up and suppress that
information. This will always be cost vs. benefit decision. And as such
it should be argued in the changelog.

As so many dozens of times before, I will point you to an incremental
nature of changes we really prefer in the mm land. We are also after a
simplicity which your proposal lacks in many aspects. You seem to ignore
that general approach and I have hard time to consider your NAK as a
relevant feedback. Going to an extreme and basing a complex solution on
it is not going to fly. No killable process should be a rare event which
requires a seriously misconfigured memcg to happen so wildly. If you can
trigger it with a normal user privileges then it would be a clear bug to
address rather than work around with printk throttling.
--
Michal Hocko
SUSE Labs