Re: [patch 3/3] mm: page_alloc: fair zone allocator policy

From: Andrea Arcangeli
Date: Mon Jul 29 2013 - 13:48:53 EST


Hi Johannes,

On Fri, Jul 19, 2013 at 04:55:25PM -0400, Johannes Weiner wrote:
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index af1d956b..d938b67 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1879,6 +1879,14 @@ zonelist_scan:
> if (alloc_flags & ALLOC_NO_WATERMARKS)
> goto try_this_zone;
> /*
> + * Distribute pages in proportion to the individual
> + * zone size to ensure fair page aging. The zone a
> + * page was allocated in should have no effect on the
> + * time the page has in memory before being reclaimed.
> + */
> + if (atomic_read(&zone->alloc_batch) <= 0)
> + continue;
> + /*
> * When allocating a page cache page for writing, we
> * want to get it from a zone that is within its dirty
> * limit, such that no single zone holds more than its

I rebased the zone_reclaim_mode and compaction fixes on top of the
zone fair allocator (it applied without rejects, lucky) but the above
breaks zone_reclaim_mode (it regress for pagecache too, which
currently works), so then in turn my THP/compaction tests break too.

zone_reclaim_mode isn't LRU-fair, and cannot be... (even migrating
cache around nodes to try to keep LRU fariness would not be worth it,
especially with ssds). But we can still increase the fairness within
the zones of the current node (for those nodes that have more than 1
zone).

I think to fix it we need an additional first pass of the fast path,
and if alloc_batch is <= 0 for any zone in the current node, we then
forbid allocating from the zones not in the current node (even if
alloc_batch would allow it) during the first pass, only if
zone_reclaim_mode is enabled. If first pass fails, we need to reset
alloc_batch for all zones in the current node (and only in the current
zone), goto zonelist_scan and continue as we do now.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/