Re: [RFC PATCH] mm: bypass swap readahead for zswap

From: Kairui Song

Date: Wed Jun 24 2026 - 06:31:55 EST


On Wed, Jun 24, 2026 at 3:59 PM Alexandre Ghiti <alex@xxxxxxxx> wrote:
>
> Commit 0bcac06f27d7 ("mm, swap: skip swapcache for swapin of synchronous
> device") made SWP_SYNCHRONOUS_IO devices (e.g. zram) skip swap readahead.
>
> zswap is the same kind of in-memory, synchronous backend as zram, not a
> swap device flagged SWP_SYNCHRONOUS_IO so it still goes through
> swapin_readahead().
>
> Here are the results from bypassing readahead for zswap too: it was
> measured with a kernel build (make -j16) in a memcg, zswap=zstd, shrinker
> off, on Sapphire Rapids and 3 iterations.
>
> 768M memcg (sustained swap thrash):
> metric mm-new + bypass delta
> build time (s) 405.0 341.7 -15.6%
> zswap-in (GB) 79.5 53.0 -33%
> zswap-out (GB) 144.8 115.6 -20%
> swap readahead (pages) 6.79M 0.45M -93%
> swap_ra hit (%) 72.1 89.9 +18pp
>
> 1G memcg (light pressure, build not memory-bound):
> metric mm-new + bypass delta
> build time (s) 177.7 176.0 ~same (no regression)
> zswap-in (GB) 10.2 7.5 -26%
> zswap-out (GB) 27.7 25.1 -9%
> swap readahead (pages) 1.07M 0.08M -93%
> swap_ra hit (%) 68.6 87.2 +19pp
>
> The gain is from no longer prefetching pages that are pointless for an
> in-memory backend: readahead inflates anon residency and thrashes the
> page cache (file pages get evicted and re-read), lengthens each fault by
> synchronously (de)compressing a cluster of neighbours, and adds
> compression traffic when those extra pages are reclaimed.
>
> Bypassing swap readahead for zswap therefore makes sense.
>
> Signed-off-by: Alexandre Ghiti <alex@xxxxxxxx>
> ---
>
> - This bypass originally comes from Usama's series that implements
> large folio zswapin: while working on improving this series, I noticed
> the gains I got only came from the bypass of readahead.
>
> include/linux/zswap.h | 6 ++++++
> mm/memory.c | 5 +++--
> mm/zswap.c | 11 +++++++++++
> 3 files changed, 20 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/zswap.h b/include/linux/zswap.h
> index 30c193a1207e..b6f0e6198b6f 100644
> --- a/include/linux/zswap.h
> +++ b/include/linux/zswap.h
> @@ -35,6 +35,7 @@ void zswap_lruvec_state_init(struct lruvec *lruvec);
> void zswap_folio_swapin(struct folio *folio);
> bool zswap_is_enabled(void);
> bool zswap_never_enabled(void);
> +bool zswap_present_test(swp_entry_t swp);
> #else
>
> struct zswap_lruvec_state {};
> @@ -69,6 +70,11 @@ static inline bool zswap_never_enabled(void)
> return true;
> }
>
> +static inline bool zswap_present_test(swp_entry_t swp)
> +{
> + return false;
> +}
> +
> #endif
>
> #endif /* _LINUX_ZSWAP_H */
> diff --git a/mm/memory.c b/mm/memory.c
> index ff338c2abe92..5aa1ea9eb48a 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4827,8 +4827,9 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
> if (folio)
> swap_update_readahead(folio, vma, vmf->address);
> if (!folio) {
> - /* Swapin bypasses readahead for SWP_SYNCHRONOUS_IO devices */
> - if (data_race(si->flags & SWP_SYNCHRONOUS_IO))
> + /* Swapin bypasses readahead for SWP_SYNCHRONOUS_IO devices and zswap */
> + if (data_race(si->flags & SWP_SYNCHRONOUS_IO) ||
> + zswap_present_test(entry))

Hi Alexandre

Thanks for the test and patch, very interesting idea.

> diff --git a/mm/zswap.c b/mm/zswap.c
> index 761cd699e0a3..5b85b4d17647 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -234,6 +234,17 @@ static inline struct xarray *swap_zswap_tree(swp_entry_t swp)
> >> ZSWAP_ADDRESS_SPACE_SHIFT];
> }
>
> +/**
> + * zswap_present_test - check if a swap entry is currently backed by zswap
> + * @swp: the swap entry to test
> + *
> + * Return: true if @swp has a zswap entry, false otherwise.
> + */
> +bool zswap_present_test(swp_entry_t swp)
> +{
> + return xa_load(swap_zswap_tree(swp), swp_offset(swp));

Better check zswap_never_enabled first to avoid a xa_load if not needed.