Re: [PATCH 12/27] mm, vmscan: Make shrink_node decisions more node-centric

From: Vlastimil Babka
Date: Thu Jun 16 2016 - 09:35:25 EST


On 06/09/2016 08:04 PM, Mel Gorman wrote:
Earlier patches focused on having direct reclaim and kswapd use data that
is node-centric for reclaiming but shrink_node() itself still uses too much
zone information. This patch removes unnecessary zone-based information
with the most important decision being whether to continue reclaim or
not. Some memcg APIs are adjusted as a result even though memcg itself
still uses some zone information.

Signed-off-by: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx>

[...]

@@ -2372,21 +2374,27 @@ static inline bool should_continue_reclaim(struct zone *zone,
* inactive lists are large enough, continue reclaiming
*/
pages_for_compaction = (2UL << sc->order);
- inactive_lru_pages = node_page_state(zone->zone_pgdat, NR_INACTIVE_FILE);
+ inactive_lru_pages = node_page_state(pgdat, NR_INACTIVE_FILE);
if (get_nr_swap_pages() > 0)
- inactive_lru_pages += node_page_state(zone->zone_pgdat, NR_INACTIVE_ANON);
+ inactive_lru_pages += node_page_state(pgdat, NR_INACTIVE_ANON);
if (sc->nr_reclaimed < pages_for_compaction &&
inactive_lru_pages > pages_for_compaction)
return true;

/* If compaction would go ahead or the allocation would succeed, stop */
- switch (compaction_suitable(zone, sc->order, 0, 0)) {
- case COMPACT_PARTIAL:
- case COMPACT_CONTINUE:
- return false;
- default:
- return true;
+ for (z = 0; z <= sc->reclaim_idx; z++) {
+ struct zone *zone = &pgdat->node_zones[z];
+
+ switch (compaction_suitable(zone, sc->order, 0, 0)) {

Using 0 for classzone_idx here was sort of OK when each zone was reclaimed separately, as a Normal allocation not passing appropriate classzone_idx (and thus subtracting lowmem reserve from free pages) means that a false COMPACT_PARTIAL (or COMPACT_CONTINUE) could be returned for e.g. DMA zone. It means a premature end of reclaim for this single zone, which is relatively small anyway, so no big deal (and we might avoid useless over-reclaim, when even reclaiming everything wouldn't get us above the lowmem_reserve).

But in node-centric reclaim, such premature "return false" from a DMA zone stops reclaiming the whole node. So I think we should involve the real classzone_idx here.

+ case COMPACT_PARTIAL:
+ case COMPACT_CONTINUE:
+ return false;
+ default:
+ /* check next zone */
+ ;
+ }
}
+ return true;
}