Re: [PATCH 0/7] mm: Improve swap path scalability with batched operations

From: Michal Hocko
Date: Thu May 05 2016 - 03:49:29 EST


On Wed 04-05-16 17:25:06, Johannes Weiner wrote:
> On Wed, May 04, 2016 at 09:49:02PM +0200, Michal Hocko wrote:
> > On Wed 04-05-16 10:13:06, Tim Chen wrote:
> > In order this to work other quite intrusive changes to the current
> > reclaim decisions would have to be made though. This is what I tried to
> > say. Look at get_scan_count() on how we are making many steps to ignore
> > swappiness or prefer the page cache. Even when we make swapout scale it
> > won't help much if we do not swap out that often. That's why I claim
> > that we really should think more long term and maybe reconsider these
> > decisions which were based on the rotating rust for the swap devices.
>
> While I agree that such balancing rework is necessary to make swap
> perform optimally, I don't see why this would be a dependency for
> making the mechanical swapout paths a lot leaner.

Ohh, I didn't say this would be a dependency. I am all for preparing
the code for a better scaling I just felt that the patch is quite large
with a small benefit at this moment and the initial description was not
very clear about the motivation and changes seemed to be shaped by an
artificial test case.

> I'm actually working on improving the LRU balancing decisions for fast
> random IO swap devices, and hope to have something to submit soon.

That is really good to hear!

> > > I understand that the patch set is a little large. Any better
> > > ideas for achieving similar ends will be appreciated.  I put
> > > out these patches in the hope that it will spur solutions
> > > to improve swap.
> > >
> > > Perhaps the first two patches to make shrink_page_list into
> > > smaller components can be considered first, as a first step 
> > > to make any changes to the reclaim code easier.
>
> It makes sense that we need to batch swap allocation and swap cache
> operations. Unfortunately, the patches as they stand turn
> shrink_page_list() into an unreadable mess. This would need better
> refactoring before considering them for upstream merging. The swap
> allocation batching should not obfuscate the main sequence of events
> that is happening for both file-backed and anonymous pages.

That was my first impression as well but to be fair I only skimmed
through the patch so I might be just biased by the size.

> It'd also be great if the remove_mapping() batching could be done
> universally for all pages, given that in many cases file pages from
> the same inode also cluster together on the LRU.

Agreed!

--
Michal Hocko
SUSE Labs