Re: [PATCH 00/34] Move LRU page reclaim from zones to nodes v9

From: Andrea Arcangeli
Date: Fri Aug 19 2016 - 09:55:23 EST


On Fri, Aug 19, 2016 at 03:23:20PM +0200, Vlastimil Babka wrote:
> What's that? Never head of this before, but sounds scary :) I thought
> that zone_reclaim itself was rather discouraged nowadays, not a big
> candidate for further improvement.,,

It's some fix that I tried to push upstream but wasn't merged. I kept
maintaining it because I got customers bugreport about THP causing
regressions to node_reclaim.

Hard NUMA bindings would solve that but apparently there are apps that
prefers no memory binding to allow flexible spillover, and they only
use CPU bindings only but with a strong NUMA bias provided by
node_reclaim, by shrinking the cache (and only the cache).

In any case it was a regression caused by THP because compaction
wasn't invoked. Note zone_reclaim has a synchronous more aggressive
option that blocks for write back if needed, so invoking direct
compaction there is sure ok, if it's asked on demand.

As usual it's always a tradeoff between long live and short lived
allocation so if you reserve a system for computations and you know
your allocation are very long lived it make perfect sense to be
aggressive if you tune for it.

zone_reclaim or synchronous direct compaction are obviously bad
defaults for general purpose default settings, it doesn't mean it
should be impossible to tune a system for a certain workload to run
optimal.

> Hm I'm not so sure. Are all movable allocations highmem? For example
> Joonsoo mentions in his ZONE_CMA patchset "blockdev file cache page
> [...] usually has __GFP_MOVABLE but not __GFP_HIGHMEM and __GFP_USER".
> Now we also have Minchan's infrastructure for arbitrary driver
> compaction, so those will be movable, but potentially still restricted
> to e.g. DMA32...

One option is to forbid such corner cases... and VM_WARN_ON (not a
typo :) available in my tree) if __GFP_MOVABLE is passed on lower
classzones.

The other option would be to have a per-classzone lowpfn, highpnf scan
pointers. That has some cons but hey this whole thing is a tradeoff
isn't it?

It's about the fact we're optimizing for less frequent lowmem
allocations so we can as well provide a worse compaction for lowmem
(by reducing the MOVABLE memory restricted to lower classzones like
mentioned above), but leverage the node model to have a more powerful
that crosses all zone boundaries, when the GFP_HIGHUSER is used.

I don't see why the tradeoff is valid when it comes to the LRU but not
valid when it comes to compaction and then I've to do a blind loop of
(for-each-zone-in-the-node-in-reverse { compact_zone_order(zone) })
which works worse than before and works worse than a
zone-boundary-less compaction based on the node model.