Re: [PATCH] memcg: effective memory.high reclaim for remote charging

From: Johannes Weiner
Date: Mon May 11 2020 - 11:57:09 EST


On Thu, May 07, 2020 at 10:00:07AM -0700, Shakeel Butt wrote:
> On Thu, May 7, 2020 at 9:47 AM Michal Hocko <mhocko@xxxxxxxxxx> wrote:
> >
> > On Thu 07-05-20 09:33:01, Shakeel Butt wrote:
> > [...]
> > > @@ -2600,8 +2596,23 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
> > > schedule_work(&memcg->high_work);
> > > break;
> > > }
> > > - current->memcg_nr_pages_over_high += batch;
> > > - set_notify_resume(current);
> > > +
> > > + if (gfpflags_allow_blocking(gfp_mask))
> > > + reclaim_over_high(memcg, gfp_mask, batch);
> > > +
> > > + if (page_counter_read(&memcg->memory) <=
> > > + READ_ONCE(memcg->high))
> > > + break;
> >
> > I am half way to a long weekend so bear with me. Shouldn't this be continue? The
> > parent memcg might be still in excess even the child got reclaimed,
> > right?
> >
>
> The reclaim_high() actually already does this walk up to the root and
> reclaim from ones who are still over their high limit. Though having
> 'continue' here is correct too.

If reclaim was weak and failed to bring the child back in line, we
still do set_notify_resume(). We should do that for ancestors too.

But it seems we keep adding hierarchy walks and it's getting somewhat
convoluted: page_counter does it, then we check high overage
recursively, and now we add the call to reclaim which itself is a walk
up the ancestor line.

Can we hitchhike on the page_counter_try_charge() walk, which already
has the concept of identifying counters with overage? Rename the @fail
to @limited and return the first counter that is in excess of its high
as well, even when the function succeeds?

Then we could ditch the entire high checking loop here and simply
replace it with

done_restock:
...

if (*limited) {
if (gfpflags_allow_blocking())
reclaim_over_high(memcg_from_counter(limited));
/* Reclaim may not be able to do much, ... */
set_notify_resume(); // or schedule_work()
};

In the long-term, the best thing might be to integrate memory.high
reclaim with the regular reclaim that try_charge() is already
doing. Especially the part where it retries several times - we
currently give up on memory.high unnecessarily early. Make
page_counter_try_charge() fail on high and max equally, and after
several reclaim cycles, instead of invoking the OOM killer, inject the
penalty sleep and force the charges. OOM killing and throttling is
supposed to be the only difference between the two, anyway, and yet
the code diverges far more than that for no apparent reason.

But I also appreciate that this is a cleanup beyond the scope of this
patch here, so it's up to you how far you want to take it.