Re: [patch] mm, thp: do not cause memcg oom for thp

From: Michal Hocko
Date: Wed Mar 21 2018 - 16:53:29 EST


On Wed 21-03-18 12:37:10, David Rientjes wrote:
> On Wed, 21 Mar 2018, Michal Hocko wrote:
>
> > > I'm not sure of the expectation of high-order memcg charging without
> > > __GFP_NORETRY,
> >
> > It should be semantically compatible with the allocation path.
> >
>
> That doesn't make sense, the allocation path needs to allocate contiguous
> memory for the high order, the charging path just needs to charge a number
> of pages. Why would the allocation and charging path be compatible when
> one needs to reclaim contiguous memory or compact memory and the the other
> just needs to reclaim any memory?

Because you do not want to see surprises. E.g. seeing unexpected OOMs
for large allocatations. Just think about it. Do you really want to have
a different reclaim policy for the allocation and charging for all
allocating paths? THP is by no means special. We do have different gfp
masks for THP to express how hard to try. Why should the charge path
behave any different?

You are right that the allocation path involves compaction and that is
different from the charging path. But that is an implementation detail
of the current implementation. Semantically, it is the gfp mask to tell
how hard to try and treating just because of how the current code works
is simply wrong.

> > > I only know that khugepaged can now cause memcg oom kills
> > > when trying to collapse memory, and then subsequently found that the same
> > > situation exists for faulting instead of falling back to small pages.
> >
> > And that is clearly a bug because page allocator doesn't oom kill while
> > the memcg charge does for the same gfp flag. That should be fixed.
> >
>
> It's fixed with my patch, yes.

Your patch only fixes up the current situation. Anytime a new THP
allocation emerges that code path has to be careful to add
__GFP_NORETRY to not regress again. That is just too error prone.

> The page allocator doesn't oom kill for
> orders over PAGE_ALLOC_COSTLY_ORDER only because it is unlikely to free
> order-4 and higher contiguous memory as a result; it's in the name, it's a
> costly order for the page allocator. Using it as a heuristic in the memcg
> charging path seems strange.

It is not strange at all. We have the concept that large allocations are
OK to fail rather than cause disruptive actions. And the same applies
for charges as well. There is no reason to over reclaim or even OOM kill
for a large charge if we have a fallback.

Seriously. Making different polices to the allocation and the memcg
charge will lead to both unexpected behavior and a maintenance mess.
And there is no good reason for that.
--
Michal Hocko
SUSE Labs