Re: [PATCH] memcg, thp: do not invoke oom killer on thp charges

From: Michal Hocko
Date: Wed Mar 21 2018 - 17:41:13 EST


On Wed 21-03-18 14:22:13, David Rientjes wrote:
> On Wed, 21 Mar 2018, Michal Hocko wrote:
>
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index d1a917b5b7b7..08accbcd1a18 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -1493,7 +1493,7 @@ static void memcg_oom_recover(struct mem_cgroup *memcg)
> >
> > static void mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int order)
> > {
> > - if (!current->memcg_may_oom)
> > + if (!current->memcg_may_oom || order > PAGE_ALLOC_COSTLY_ORDER)
> > return;
> > /*
> > * We are in the middle of the charge context here, so we
>
> What bug reports have you received about order-4 and higher order non thp
> charges that this fixes?

We do not have any costly _OOM killable_ allocations but THP AFAIR. Or
am I missing any?

> The patch title and the changelog specifically single out thp, which I've
> fixed, since it has sane fallback behavior and everything else uses
> __GFP_NORETRY. I think this is misusing a page allocator heuristic that
> hasn't been applied to the memcg charge path before to address a thp
> regression but generalizing it for all charges.

Yes, which is the whole point! We do not want a THP specific workaround.
Just look at the bug your original patch was fixing. The regression was
caused by a change which generalizes gfp masks for THP because different
policies imply a different effort. As a side effect THP charges got OOM
killable. I would call it quite non intuitive and error prone.

> PAGE_ALLOC_COSTLY_ORDER is a heuristic used by the page allocator because
> it cannot free high-order contiguous memory. Memcg just needs to reclaim
> a number of pages. Two order-3 charges can cause a memcg oom kill but now
> an order-4 charge cannot. It's an unfair bias against high-order charges
> that are not explicitly using __GFP_NORETRY.

PAGE_ALLOC_COSTLY_ORDER is documented and people know what to expect
from such a request. Diverging from that behavior just comes as a
surprise. There is no reason for that and as the above outlines it is
error prone.

--
Michal Hocko
SUSE Labs