Re: [PATCH] memcg, thp: do not invoke oom killer on thp charges
From: Michal Hocko
Date: Fri Mar 23 2018 - 05:07:14 EST
On Thu 22-03-18 13:29:37, David Rientjes wrote:
> On Thu, 22 Mar 2018, Michal Hocko wrote:
[...]
> > They simply cannot because kmalloc performs the change under the cover.
> > So you would have to use kmalloc(gfp|__GFP_NORETRY) to be absolutely
> > sure to not trigger _any_ oom killer. This is just wrong thing to do.
> >
>
> Examples of where this isn't already done? It certainly wasn't a problem
> before __GFP_NORETRY was dropped in commit 2516035499b9 but you suspect
> it's a problem now.
It is not a problem _right now_ as I've already pointed out few
times. We do not trigger the OOM killer for anything but #PF path. But
this is an implementation detail which can change in future and there is
actually some demand for the change. Once we start triggering the oom
killer for all charges then we do not really want to have the disparity.
> > > You're diverging from it because the memcg charge path has never had this
> > > heuristic.
> >
> > Which is arguably a bug which just didn't matter because we do not
> > have costly order oom eligible charges in general and THP was subtly
> > different and turned out to be error prone.
> >
>
> It was inadvertently dropped from commit 2516035499b9. There were no
> high-order charge oom kill problems before this commit. People know how
> to use __GFP_NORETRY or leave it off, which you don't trust them to do
> because you're hardcoding a heuristic in the charge path.
No. Just read what I wrote. I am worried that the current disparity
between the page allocator and the memcg charging will _force_ them to
do hacks and sometimes (e.g. kmalloc) they will not have any option but
using __GFP_NORETRY even when that is not really needed and it has a
different semantic than they would like.
Behavior on with and without memcgs should be as similar as possible
otherwise you will see different sets of bugs when running under the
memcg and without. I really fail to see what is so hard about this to
understand.
[...]
> > > Your change is broken and I wouldn't push it to Linus for rc7 if my life
> > > depended on it. What is the response when someone complains that they
> > > start getting a ton of MEMCG_OOM notifications for every thp fallback?
> > > They will, because yours is a broken implementation.
> >
> > I fail to see what is broken. Could you be more specific?
> >
>
> I said MEMCG_OOM notifications on thp fallback. You modified
> mem_cgroup_oom(). What is called before mem_cgroup_oom()?
> mem_cgroup_event(mem_over_limit, MEMCG_OOM). That increments the
> MEMCG_OOM event and anybody waiting on the events file gets notified it
> changed. They read a MEMCG_OOM event. It's thp fallback, it's not memcg
> oom.
MEMCG_OOM doesn't count the number of oom killer invocations. That has
never been the case.
> Really, I can't continue to write 100 emails in this thread.
Then try to read and even try to understand concerns expressed in those
emails. Repeating the same set of arguments and ignoring the rest is not
really helpful.
--
Michal Hocko
SUSE Labs