Re: [RFC PATCH] mm: vmscan: do not iterate all mem cgroups for global direct reclaim

From: Michal Hocko
Date: Thu Jan 24 2019 - 03:44:02 EST

On Wed 23-01-19 12:24:38, Yang Shi wrote:
> On 1/23/19 1:59 AM, Michal Hocko wrote:
> > On Wed 23-01-19 04:09:42, Yang Shi wrote:
> > > In current implementation, both kswapd and direct reclaim has to iterate
> > > all mem cgroups. It is not a problem before offline mem cgroups could
> > > be iterated. But, currently with iterating offline mem cgroups, it
> > > could be very time consuming. In our workloads, we saw over 400K mem
> > > cgroups accumulated in some cases, only a few hundred are online memcgs.
> > > Although kswapd could help out to reduce the number of memcgs, direct
> > > reclaim still get hit with iterating a number of offline memcgs in some
> > > cases. We experienced the responsiveness problems due to this
> > > occassionally.
> > Can you provide some numbers?
> What numbers do you mean? How long did it take to iterate all the memcgs?
> For now I don't have the exact number for the production environment, but
> the unresponsiveness is visible.

Yeah, I would be interested in the worst case direct reclaim latencies.
You can get that from our vmscan tracepoints quite easily.

> I had some test number with triggering direct reclaim with 8k memcgs
> artificially, which has just one clean page charged for each memcg, so the
> reclaim is cheaper than real production environment.
> perf shows it took around 220ms to iterate 8k memcgs:
>               dd 13873 [011]   578.542919:
> vmscan:mm_vmscan_direct_reclaim_begin
>               dd 13873 [011]   578.758689:
> vmscan:mm_vmscan_direct_reclaim_end
> So, iterating 400K would take at least 11s in this artificial case. The
> production environment is much more complicated, so it would take much
> longer in fact.

Having real world numbers would definitely help with the justification.

> > > Here just break the iteration once it reclaims enough pages as what
> > > memcg direct reclaim does. This may hurt the fairness among memcgs
> > > since direct reclaim may awlays do reclaim from same memcgs. But, it
> > > sounds ok since direct reclaim just tries to reclaim SWAP_CLUSTER_MAX
> > > pages and memcgs can be protected by min/low.
> > OK, this makes some sense to me. The purpose of the direct reclaim is
> > to reclaim some memory and throttle the allocation pace. The iterator is
> > cached so the next reclaimer on the same hierarchy will simply continue
> > so the fairness should be more or less achieved.
> Yes, you are right. I missed this point.
> >
> > Btw. is there any reason to keep !global_reclaim() check in place? Why
> > is it not sufficient to exclude kswapd?
> Iterating all memcgs in kswapd is still useful to help to reduce those
> zombie memcgs.

Yes, but for that you do not need to check for global_reclaim right?
Michal Hocko