[PATCH 0/5] mm: vmscan: fix kswapd writeback regression

From: Johannes Weiner
Date: Mon Jan 23 2017 - 13:17:05 EST


We noticed a regression on multiple hadoop workloads when moving from
3.10 to 4.0 and 4.6, which involves kswapd getting tangled up in page
writeout, causing direct reclaim herds that also don't make progress.

I tracked it down to the thrash avoidance efforts after 3.10 that make
the kernel better at keeping use-once cache and use-many cache sorted
on the inactive and active list, with more aggressive protection of
the active list as long as there is inactive cache. Unfortunately, our
workload's use-once cache is mostly from streaming writes. Waiting for
writes to avoid potential reloads in the future is not a good tradeoff.

These patches do the following:

1. Wake the flushers when kswapd sees a lump of dirty pages. It's
possible to be below the dirty background limit and still have
cache velocity push them through the LRU. So start a-flushin'.

2. Let kswapd only write pages that have been rotated twice. This
makes sure we really tried to get all the clean pages on the
inactive list before resorting to horrible LRU-order writeback.

3. Move rotating dirty pages off the inactive list. Instead of
churning or waiting on page writeback, we'll go after clean active
cache. This might lead to thrashing, but in this state memory
demand outstrips IO speed anyway, and reads are faster than writes.

More details in the individual changelogs.

include/linux/mm_inline.h | 7 ++++
include/linux/mmzone.h | 2 --
include/linux/writeback.h | 2 +-
include/trace/events/writeback.h | 2 +-
mm/swap.c | 9 ++---
mm/vmscan.c | 68 +++++++++++++++-----------------------
6 files changed, 41 insertions(+), 49 deletions(-)