[RFC 2/3] mm: throttle on IO only when there are too many dirty and writeback pages

From: mhocko
Date: Thu Oct 29 2015 - 11:17:43 EST


From: Michal Hocko <mhocko@xxxxxxxx>

wait_iff_congested has been used to throttle allocator before it retried
another round of direct reclaim to allow the writeback to make some
progress and prevent reclaim from looping over dirty/writeback pages
without making any progress. We used to do congestion_wait before
0e093d99763e ("writeback: do not sleep on the congestion queue if
there are no congested BDIs or if significant congestion is not being
encountered in the current zone") but that led to undesirable stalls
and sleeping for the full timeout even when the BDI wasn't congested.
Hence wait_iff_congested was used instead. But it seems that even
wait_iff_congested doesn't work as expected. We might have a small file
LRU list with all pages dirty/writeback and yet the bdi is not congested
so this is just a cond_resched in the end and can end up triggering pre
mature OOM.

This patch replaces the unconditional wait_iff_congested by
congestion_wait which is executed only if we _know_ that the last round
of direct reclaim didn't make any progress and dirty+writeback pages are
more than a half of the reclaimable pages on the zone which might be
usable for our target allocation. This shouldn't reintroduce stalls
fixed by 0e093d99763e because congestion_wait is called only when we
are getting hopeless when sleeping is a better choice than OOM with many
pages under IO.

Signed-off-by: Michal Hocko <mhocko@xxxxxxxx>
---
mm/page_alloc.c | 19 +++++++++++++++++--
1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9c0abb75ad53..0518ca6a9776 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3191,8 +3191,23 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
*/
if (__zone_watermark_ok(zone, order, min_wmark_pages(zone),
ac->high_zoneidx, alloc_flags, target)) {
- /* Wait for some write requests to complete then retry */
- wait_iff_congested(zone, BLK_RW_ASYNC, HZ/50);
+ unsigned long writeback = zone_page_state(zone, NR_WRITEBACK),
+ dirty = zone_page_state(zone, NR_FILE_DIRTY);
+
+ if (did_some_progress)
+ goto retry;
+
+ /*
+ * If we didn't make any progress and have a lot of
+ * dirty + writeback pages then we should wait for
+ * an IO to complete to slow down the reclaim and
+ * prevent from pre mature OOM
+ */
+ if (2*(writeback + dirty) > reclaimable)
+ congestion_wait(BLK_RW_ASYNC, HZ/10);
+ else
+ cond_resched();
+
goto retry;
}
}
--
2.6.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/