Re: [patch] mm, thp: do not cause memcg oom for thp

From: Michal Hocko
Date: Wed Mar 21 2018 - 04:22:41 EST


On Tue 20-03-18 13:25:23, David Rientjes wrote:
> On Tue, 20 Mar 2018, Michal Hocko wrote:
>
> > > Commit 2516035499b9 ("mm, thp: remove __GFP_NORETRY from khugepaged and
> > > madvised allocations") changed the page allocator to no longer detect thp
> > > allocations based on __GFP_NORETRY.
> > >
> > > It did not, however, modify the mem cgroup try_charge() path to avoid oom
> > > kill for either khugepaged collapsing or thp faulting. It is never
> > > expected to oom kill a process to allocate a hugepage for thp; reclaim is
> > > governed by the thp defrag mode and MADV_HUGEPAGE, but allocations (and
> > > charging) should fallback instead of oom killing processes.
> >
> > For some reason I thought that the charging path simply bails out for
> > costly orders - effectively the same thing as for the global OOM killer.
> > But we do not. Is there any reason to not do that though? Why don't we
> > simply do
> >
>
> I'm not sure of the expectation of high-order memcg charging without
> __GFP_NORETRY,

It should be semantically compatible with the allocation path.

> I only know that khugepaged can now cause memcg oom kills
> when trying to collapse memory, and then subsequently found that the same
> situation exists for faulting instead of falling back to small pages.

And that is clearly a bug because page allocator doesn't oom kill while
the memcg charge does for the same gfp flag. That should be fixed.

> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index d1a917b5b7b7..08accbcd1a18 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -1493,7 +1493,7 @@ static void memcg_oom_recover(struct mem_cgroup *memcg)
> >
> > static void mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int order)
> > {
> > - if (!current->memcg_may_oom)
> > + if (!current->memcg_may_oom || order > PAGE_ALLOC_COSTLY_ORDER)
> > return;
> > /*
> > * We are in the middle of the charge context here, so we
>
> That may make sense as an additional patch, but for thp allocations we
> don't want to retry reclaim nr_retries times anyway; we want the old
> behavior of __GFP_NORETRY before commit 2516035499b9.

Why? Allocation and the charge path should use the same gfp mask unless
there is a strong reason for it. If you have one then please mention it
in the changelog.

> So the above would be a follow-up patch that wouldn't replace mine.

Unless there is a strong reason to use different gfp mask for the
allocation and the charge then your fix is actually wrong.
--
Michal Hocko
SUSE Labs