Re: [PATCH] mm/vmscan: Do not block forever at shrink_inactive_list().
From: Tetsuo Handa
Date: Tue May 20 2014 - 10:58:40 EST
Today I discussed with Kosaki-san at LinuxCon Japan 2014 about this issue.
He does not like the idea of adding timeout to throttle loop. As Dave
posted a patch that fixes a bug in XFS delayed allocation, I updated
my patch accordingly.
Although the bug in XFS was fixed by Dave's patch, other kernel code
would have bugs which would fall into this infinite throttle loop.
But to keep the possibility of triggering OOM killer minimum, can we
agree with this updated patch (and in the future adding some warning
mechanism like /proc/sys/kernel/hung_task_timeout_secs for detecting
memory allocation stall)?
Dave, if you are OK with this updated patch, please let me know
commit ID of your patch.
Regards.
----------
>From 408e65d9025e8e24838e7bf6ac9066ba8a9391a6 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx>
Date: Tue, 20 May 2014 23:34:34 +0900
Subject: [PATCH] mm/vmscan: Do not throttle kswapd at shrink_inactive_list().
I can observe that commit 35cd7815 "vmscan: throttle direct reclaim when
too many pages are isolated already" causes RHEL7 environment to stall
with 0% CPU usage when a certain type of memory pressure is given.
This is because nobody can reclaim memory due to rules listed below.
(a) XFS uses a kernel worker thread for delayed allocation
(b) kswapd wakes up the kernel worker thread for delayed allocation
(c) the kernel worker thread is throttled due to commit 35cd7815
This patch and commit XXXXXXXX "xfs: block allocation work needs to be
kswapd aware" will solve rule (c).
Signed-off-by: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx>
---
mm/vmscan.c | 20 +++++++++++++++-----
1 files changed, 15 insertions(+), 5 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 32c661d..5c6960e 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1460,12 +1460,22 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
struct zone *zone = lruvec_zone(lruvec);
struct zone_reclaim_stat *reclaim_stat = &lruvec->reclaim_stat;
- while (unlikely(too_many_isolated(zone, file, sc))) {
- congestion_wait(BLK_RW_ASYNC, HZ/10);
+ /*
+ * Throttle only direct reclaimers. Allocations by kswapd (and
+ * allocation workqueue on behalf of kswapd) should not be
+ * throttled here; otherwise memory allocation will deadlock.
+ */
+ if (!sc->hibernation_mode && !current_is_kswapd()) {
+ while (unlikely(too_many_isolated(zone, file, sc))) {
+ congestion_wait(BLK_RW_ASYNC, HZ/10);
- /* We are about to die and free our memory. Return now. */
- if (fatal_signal_pending(current))
- return SWAP_CLUSTER_MAX;
+ /*
+ * We are about to die and free our memory.
+ * Return now.
+ */
+ if (fatal_signal_pending(current))
+ return SWAP_CLUSTER_MAX;
+ }
}
lru_add_drain();
--
1.7.1
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/