Re: [PATCH 0/7] mm/zswap: optimize the scalability of zswap rb-tree

From: Yosry Ahmed
Date: Wed Dec 06 2023 - 15:42:43 EST


On Wed, Dec 6, 2023 at 9:24 AM Nhat Pham <nphamcs@xxxxxxxxx> wrote:
>
> + Chris Li
>
> Chris, I vaguely remember from our last conversation that you have
> some concurrent efforts to use xarray here right?

If I recall correctly, the xarray already reduces the lock contention
as lookups are lockless, but Chris knows more here. As you mentioned
in a different email, it would be nice to get some data so that we can
compare different solutions.

>
> On Wed, Dec 6, 2023 at 1:46 AM Chengming Zhou
> <zhouchengming@xxxxxxxxxxxxx> wrote:
> >
> > Hi everyone,
> >
> > This patch series is based on the linux-next 20231205, which depends on
> > the "workload-specific and memory pressure-driven zswap writeback" series
> > from Nhat Pham.
> >
> > When testing the zswap performance by using kernel build -j32 in a tmpfs
> > directory, I found the scalability of zswap rb-tree is not good, which
> > is protected by the only spinlock. That would cause heavy lock contention
> > if multiple tasks zswap_store/load concurrently.
> >
> > So a simple solution is to split the only one zswap rb-tree into multiple
> > rb-trees, each corresponds to SWAP_ADDRESS_SPACE_PAGES (64M). This idea is
> > from the commit 4b3ef9daa4fc ("mm/swap: split swap cache into 64MB trunks").
> >
> > Although this method can't solve the spinlock contention completely, it
> > can mitigate much of that contention.
> >
> > Another problem when testing the zswap using our default zsmalloc is that
> > zswap_load() and zswap_writeback_entry() have to malloc a temporary memory
> > to support !zpool_can_sleep_mapped().
> >
> > Optimize it by reusing the percpu crypto_acomp_ctx->dstmem, which is also
> > used by zswap_store() and protected by the same percpu crypto_acomp_ctx->mutex.
> >
> > Thanks for review and comment!
> >
> > To: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> > To: Seth Jennings <sjenning@xxxxxxxxxx>
> > To: Dan Streetman <ddstreet@xxxxxxxx>
> > To: Vitaly Wool <vitaly.wool@xxxxxxxxxxxx>
> > To: Nhat Pham <nphamcs@xxxxxxxxx>
> > To: Johannes Weiner <hannes@xxxxxxxxxxx>
> > To: Yosry Ahmed <yosryahmed@xxxxxxxxxx>
> > To: Michal Hocko <mhocko@xxxxxxxxxx>
> > Cc: linux-kernel@xxxxxxxxxxxxxxx
> > Cc: linux-mm@xxxxxxxxx
> > Signed-off-by: Chengming Zhou <zhouchengming@xxxxxxxxxxxxx>
> >
> > ---
> > Chengming Zhou (7):
> > mm/zswap: make sure each swapfile always have zswap rb-tree
> > mm/zswap: split zswap rb-tree
> > mm/zswap: reuse dstmem when decompress
> > mm/zswap: change dstmem size to one page
> > mm/zswap: refactor out __zswap_load()
> > mm/zswap: cleanup zswap_load()
> > mm/zswap: cleanup zswap_reclaim_entry()
> >
> > include/linux/zswap.h | 4 +-
> > mm/swapfile.c | 10 ++-
> > mm/zswap.c | 233 +++++++++++++++++++++-----------------------------
> > 3 files changed, 106 insertions(+), 141 deletions(-)
> > ---
> > base-commit: 0f5f12ac05f36f117e793656c3f560625e927f1b
> > change-id: 20231206-zswap-lock-optimize-06f45683b02b
> >
> > Best regards,
> > --
> > Chengming Zhou <zhouchengming@xxxxxxxxxxxxx>