Re: [PATCH 2/2] memcg, oom: emit oom report when there is no eligible task

From: Johannes Weiner
Date: Tue Aug 21 2018 - 13:21:02 EST


I sent them in a separate thread. Thanks.

On Tue, Aug 21, 2018 at 04:06:12PM +0200, Michal Hocko wrote:
> Do you plan to repost these two? They are quite deep in the email thread
> so they can easily fall through cracks.
>
> On Wed 08-08-18 18:17:37, Michal Hocko wrote:
> > On Wed 08-08-18 10:45:15, Johannes Weiner wrote:
> [...]
> > > >From bba01122f739b05a689dbf1eeeb4f0e07affd4e7 Mon Sep 17 00:00:00 2001
> > > From: Johannes Weiner <hannes@xxxxxxxxxxx>
> > > Date: Wed, 8 Aug 2018 09:59:40 -0400
> > > Subject: [PATCH] mm: memcontrol: print proper OOM header when no eligible
> > > victim left
> > >
> > > When the memcg OOM killer runs out of killable tasks, it currently
> > > prints a WARN with no further OOM context. This has caused some user
> > > confusion.
> > >
> > > Warnings indicate a kernel problem. In a reported case, however, the
> > > situation was triggered by a non-sensical memcg configuration (hard
> > > limit set to 0). But without any VM context this wasn't obvious from
> > > the report, and it took some back and forth on the mailing list to
> > > identify what is actually a trivial issue.
> > >
> > > Handle this OOM condition like we handle it in the global OOM killer:
> > > dump the full OOM context and tell the user we ran out of tasks.
> > >
> > > This way the user can identify misconfigurations easily by themselves
> > > and rectify the problem - without having to go through the hassle of
> > > running into an obscure but unsettling warning, finding the
> > > appropriate kernel mailing list and waiting for a kernel developer to
> > > remote-analyze that the memcg configuration caused this.
> > >
> > > If users cannot make sense of why the OOM killer was triggered or why
> > > it failed, they will still report it to the mailing list, we know that
> > > from experience. So in case there is an actual kernel bug causing
> > > this, kernel developers will very likely hear about it.
> > >
> > > Signed-off-by: Johannes Weiner <hannes@xxxxxxxxxxx>
> >
> > Yes this works as well. We would get a dump even for the race we have
> > seen but I do not think this is something to lose sleep over. And if it
> > triggers too often to be disturbing we can add
> > tsk_is_oom_victim(current) check there.
> >
> > Acked-by: Michal Hocko <mhocko@xxxxxxxx>
> >
> > > ---
> > > mm/memcontrol.c | 2 --
> > > mm/oom_kill.c | 13 ++++++++++---
> > > 2 files changed, 10 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > > index 4e3c1315b1de..29d9d1a69b36 100644
> > > --- a/mm/memcontrol.c
> > > +++ b/mm/memcontrol.c
> > > @@ -1701,8 +1701,6 @@ static enum oom_status mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int
> > > if (mem_cgroup_out_of_memory(memcg, mask, order))
> > > return OOM_SUCCESS;
> > >
> > > - WARN(1,"Memory cgroup charge failed because of no reclaimable memory! "
> > > - "This looks like a misconfiguration or a kernel bug.");
> > > return OOM_FAILED;
> > > }
> > >
> > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> > > index 0e10b864e074..07ae222d7830 100644
> > > --- a/mm/oom_kill.c
> > > +++ b/mm/oom_kill.c
> > > @@ -1103,10 +1103,17 @@ bool out_of_memory(struct oom_control *oc)
> > > }
> > >
> > > select_bad_process(oc);
> > > - /* Found nothing?!?! Either we hang forever, or we panic. */
> > > - if (!oc->chosen && !is_sysrq_oom(oc) && !is_memcg_oom(oc)) {
> > > + /* Found nothing?!?! */
> > > + if (!oc->chosen) {
> > > dump_header(oc, NULL);
> > > - panic("Out of memory and no killable processes...\n");
> > > + pr_warn("Out of memory and no killable processes...\n");
> > > + /*
> > > + * If we got here due to an actual allocation at the
> > > + * system level, we cannot survive this and will enter
> > > + * an endless loop in the allocator. Bail out now.
> > > + */
> > > + if (!is_sysrq_oom(oc) && !is_memcg_oom(oc))
> > > + panic("System is deadlocked on memory\n");
> > > }
> > > if (oc->chosen && oc->chosen != (void *)-1UL)
> > > oom_kill_process(oc, !is_memcg_oom(oc) ? "Out of memory" :
> > > --
> > > 2.18.0
> > >
> >
> > --
> > Michal Hocko
> > SUSE Labs
>
> --
> Michal Hocko
> SUSE Labs