[PATCH 4/4] mm, vmscan: Potentially stall direct reclaimers on tree_lock contention

From: Mel Gorman
Date: Fri Sep 09 2016 - 06:00:20 EST


If a heavy writer of a single file is forcing contention on the tree_lock
then it may be necessary to tempoarily stall the direct writer to allow
kswapd to make progress. This patch marks a pgdat congested if tree_lock
is being contended on the tail of the LRU.

On a swap-intensive workload to ramdisk, the following is observed

usemem
4.8.0-rc5 4.8.0-rc5
waitqueue-v1 directcongest-v1
Amean System-1 179.61 ( 0.00%) 202.21 (-12.58%)
Amean System-3 68.91 ( 0.00%) 105.14 (-52.59%)
Amean System-5 93.09 ( 0.00%) 80.98 ( 13.01%)
Amean System-7 90.98 ( 0.00%) 81.07 ( 10.90%)
Amean System-8 299.81 ( 0.00%) 227.08 ( 24.26%)
Amean Elapsd-1 210.41 ( 0.00%) 236.56 (-12.43%)
Amean Elapsd-3 33.89 ( 0.00%) 46.78 (-38.06%)
Amean Elapsd-5 25.19 ( 0.00%) 23.33 ( 7.38%)
Amean Elapsd-7 18.45 ( 0.00%) 17.18 ( 6.91%)
Amean Elapsd-8 48.80 ( 0.00%) 38.09 ( 21.93%)

Note that system CPU usage is reduced for high thread counts but it
is not a universal win and it's known to be highly variable. The
overall time stats look like

4.8.0-rc5 4.8.0-rc5
waitqueue-v1 directcongest-v1
User 462.40 468.18
System 5127.32 4875.92
Elapsed 2364.08 2539.77

It takes longer to complete but uses less system CPU. The benefit
is more noticable with xfs_io rewriting a file backed by ramdisk

4.8.0-rc5 4.8.0-rc5
waitqueue-v1r24 directcongest-v1r24
Amean pwrite-single-rewrite-async-System 3.23 ( 0.00%) 3.21 ( 0.80%)
Amean pwrite-single-rewrite-async-Elapsd 3.33 ( 0.00%) 3.31 ( 0.67%)

4.8.0-rc5 4.8.0-rc5
waitqueue-v1 directcongest-v1
User 8.76 9.25
System 392.31 389.10
Elapsed 406.29 403.74

As with the previous patch, a test from Dave Chinner would be necessary
to decide whether this patch is worthwhile. It seems reasonable to favour
workloads that are heavily writing files than heavily swapping as the
former situation is normal and reasonable while the latter situation will
never be optimal.

Signed-off-by: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx>
---
mm/vmscan.c | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 936070b0790e..953df97abe0c 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -771,6 +771,15 @@ static unsigned long remove_mapping_list(struct list_head *mapping_list,
/* Stall kswapd once for 10ms on contention */
if (cmpxchg(&kswapd_exclusive, NUMA_NO_NODE, pgdat->node_id) != NUMA_NO_NODE) {
DEFINE_WAIT(wait);
+
+ /*
+ * Tag the pgdat as congested as it may
+ * indicate contention with a heavy
+ * writer that should stall on
+ * wait_iff_congested.
+ */
+ set_bit(PGDAT_CONGESTED, &pgdat->flags);
+
prepare_to_wait(&kswapd_contended_wait,
&wait, TASK_INTERRUPTIBLE);
io_schedule_timeout(HZ/100);
--
2.6.4