Re: [PATCH 0/7] mm: Improve swap path scalability with batched operations

From: Tim Chen
Date: Wed May 04 2016 - 13:13:23 EST

On Wed, 2016-05-04 at 14:45 +0200, Michal Hocko wrote:
> On Tue 03-05-16 14:00:39, Tim Chen wrote:
> [...]
> >
> > Âinclude/linux/swap.h |ÂÂ29 ++-
> > Âmm/swap_state.cÂÂÂÂÂÂ| 253 +++++++++++++-----
> > Âmm/swapfile.cÂÂÂÂÂÂÂÂ| 215 +++++++++++++--
> > Âmm/vmscan.cÂÂÂÂÂÂÂÂÂÂ| 725 ++++++++++++++++++++++++++++++++++++++-
> > ------------
> > Â4 files changed, 945 insertions(+), 277 deletions(-)
> This is rather large change for a normally rare path. We have been
> trying to preserve the anonymous memory as much as possible and
> rather
> push the page cache out. In fact swappiness is ignored most of the
> time for the vast majority of workloads.
> So this would help anonymous mostly workloads and I am really
> wondering
> whether this is something worth bothering without further and deeper
> rethinking of our current reclaim strategy. I fully realize that the
> swap out sucks and that the new storage technologies might change the
> way how we think about anonymous memory being so "special" wrt. disk
> based caches but I would like to see a stronger use case than "we
> have
> been playing with some artificial use case and it scales better"

With non-volatile ram based block devices, swap device could be very
fast, approaching RAM speed and can potentially be used as a secondary
memory. Just configuring these NVRAM as swap will be
an easy way for apps to make use of them without doing any heavy
lifting to change the apps. ÂBut the swap path is soÂ
un-scalable today that such use case
is unfeasible, even more so for multi-threaded server machines.

I understand that the patch set is a little large. Any better
ideas for achieving similar ends will be appreciated. ÂI put
out these patches in the hope that it will spur solutions
to improve swap.

Perhaps the first two patches to make shrink_page_list into
smaller components can be considered first, as a first stepÂ
to make any changes to the reclaim code easier.