Re: [RFC PATCH 00/27] Move LRU page reclaim from zones to nodes v2

From: Mel Gorman
Date: Wed Feb 24 2016 - 05:47:10 EST


On Tue, Feb 23, 2016 at 04:12:01PM -0800, Johannes Weiner wrote:
> > > > > If reclaim can't guarantee a balanced zone utilization then the
> > > > > allocator has to keep doing it. :(
> > > >
> > > > That's the key issue - the main reason balanced zone utilisation is
> > > > necessary is because we reclaim on a per-zone basis and we must avoid
> > > > page aging anomalies. If we balance such that one eligible zone is above
> > > > the watermark then it's less of a concern.
> > >
> > > Yes, but only if there can't be extended reclaim stretches that prefer
> > > the pages of a single zone. Yet it looks like this is still possible.
> >
> > And that is a problem if a workload is dominated by allocations
> > requiring the lower zones. If that is the common case then it's a bust
> > and fair zone allocation policy is still required. That removes one
> > motivation from the series as it leaves some fatness in the page
> > allocator paths.
>
> With your above explanations, I'm now much more confident this series
> is doing the right thing. Thanks.
>
> The uncertainty over low-zone allocation floods is real, but what is
> also unsettling is that, where the fair zone code used to shield us
> from kswapd changes, we now open ourselves up to subtle aging bugs,
> which are no longer detectable via the zone placement statistics. And
> we have changed kswapd around quite extensively in the recent past.
>
> A good metric for aging distortion might be able to mitigate both
> these things. Something to keep an eye on when making changes to
> kswapd, or when analyzing performance problems with a workload.
>
> What I have in mind is per-classzone counters of reclaim work. If we
> had exact numbers on how much zone-restricted reclaim is being done
> relative to unrestricted scans, we could know how severely the aging
> process is being distorted under any given workload. That would allow
> us to validate these changes here, future kswapd and allocator
> changes, and help us identify problematic workloads.
>

Ok, that makes me think that I should keep the per-zone pgscan figures
even if they are based on node LRU reclaim because we'll know what the
per-zone scan activity is. We already know how many pages get skipped
when reclaiming for lower zones.

> And maybe we can change the now useless pgalloc_ stats from counting
> zone placement to counting allocation requests by classzone.

I can't convince myself about this one way or the other.

> We could
> then again correlate the number of requests to the amount of work
> done. A high amount of restricted reclaim on behalf of mostly Normal
> allocation requests would detect the bug I described above, e.g. And
> we could generally tell how expensive restricted allocations are in
> the new node-LRUs.
>

I keep thinking the skip statistics gives us similar data -- it does
not tell us how many restricted allocations that resulted in reclaim was
but we do get an idea of the amount of work caused.

I'll think about it some more and see what I come up with.

--
Mel Gorman
SUSE Labs