Re: [PATCH 07/10] mm: vmscan: Block kswapd if it is encounteringpages under writeback

From: Rik van Riel
Date: Thu Mar 21 2013 - 14:44:34 EST


On 03/17/2013 09:04 AM, Mel Gorman wrote:
Historically, kswapd used to congestion_wait() at higher priorities if it
was not making forward progress. This made no sense as the failure to make
progress could be completely independent of IO. It was later replaced by
wait_iff_congested() and removed entirely by commit 258401a6 (mm: don't
wait on congested zones in balance_pgdat()) as it was duplicating logic
in shrink_inactive_list().

This is problematic. If kswapd encounters many pages under writeback and
it continues to scan until it reaches the high watermark then it will
quickly skip over the pages under writeback and reclaim clean young
pages or push applications out to swap.

The use of wait_iff_congested() is not suited to kswapd as it will only
stall if the underlying BDI is really congested or a direct reclaimer was
unable to write to the underlying BDI. kswapd bypasses the BDI congestion
as it sets PF_SWAPWRITE but even if this was taken into account then it
would cause direct reclaimers to stall on writeback which is not desirable.

This patch sets a ZONE_WRITEBACK flag if direct reclaim or kswapd is
encountering too many pages under writeback. If this flag is set and
kswapd encounters a PageReclaim page under writeback then it'll assume
that the LRU lists are being recycled too quickly before IO can complete
and block waiting for some IO to complete.

I really like the concept of this patch.

@@ -756,9 +769,11 @@ static unsigned long shrink_page_list(struct list_head *page_list,
*/
SetPageReclaim(page);
nr_writeback++;
+
goto keep_locked;
+ } else {
+ wait_on_page_writeback(page);
}
- wait_on_page_writeback(page);
}

if (!force_reclaim)

This looks like an area for future improvement.

We do not need to wait for this specific page to finish writeback,
we only have to wait for any (bunch of) page(s) to finish writeback,
since we do not particularly care which of the pages from near the
end of the LRU get reclaimed first.

I wonder if this is one of the causes for the high latencies that
are sometimes observed in direct reclaim...

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/