Re: [RFC PATCH 1/1] vmscan: Support multiple kswapd threads per node

From: Matthew Wilcox
Date: Fri Oct 02 2020 - 10:00:51 EST


On Fri, Oct 02, 2020 at 09:53:05AM -0400, Rik van Riel wrote:
> On Fri, 2020-10-02 at 09:03 +0200, Michal Hocko wrote:
> > On Thu 01-10-20 18:18:10, Sebastiaan Meijer wrote:
> > > (Apologies for messing up the mailing list thread, Gmail had fooled
> > > me into
> > > believing that it properly picked up the thread)
> > >
> > > On Thu, 1 Oct 2020 at 14:30, Michal Hocko <mhocko@xxxxxxxx> wrote:
> > > > On Wed 30-09-20 21:27:12, Sebastiaan Meijer wrote:
> > > > > > yes it shows the bottleneck but it is quite artificial. Read
> > > > > > data is
> > > > > > usually processed and/or written back and that changes the
> > > > > > picture a
> > > > > > lot.
> > > > > Apologies for reviving an ancient thread (and apologies in
> > > > > advance for my lack
> > > > > of knowledge on how mailing lists work), but I'd like to offer
> > > > > up another
> > > > > reason why merging this might be a good idea.
> > > > >
> > > > > From what I understand, zswap runs its compression on the same
> > > > > kswapd thread,
> > > > > limiting it to a single thread for compression. Given enough
> > > > > processing power,
> > > > > zswap can get great throughput using heavier compression
> > > > > algorithms like zstd,
> > > > > but this is currently greatly limited by the lack of threading.
> > > >
> > > > Isn't this a problem of the zswap implementation rather than
> > > > general
> > > > kswapd reclaim? Why zswap doesn't do the same as normal swap out
> > > > in a
> > > > context outside of the reclaim?
>
> On systems with lots of very fast IO devices, we have
> also seen kswapd take 100% CPU time without any zswap
> in use.
>
> This seems like a generic issue, though zswap does
> manage to bring it out on lower end systems.

Then, given Mel's observation about contention on the LRU lock, what's
the solution? Partition the LRU list? Batch removals from the LRU list
by kswapd and hand off to per-?node?cpu? worker threads?

Rik, if you have access to one of those systems, I'd be interested to know
whether using file THPs would help with your workload. Tracking only
one THP instead of, say, 16 regular size pages is going to reduce the
amount of time taken to pull things off the LRU list.