Re: [RFC PATCH] mm: bypass swap readahead for zswap

From: David Hildenbrand (Arm)

Date: Wed Jun 24 2026 - 11:04:10 EST


On 6/24/26 09:55, Alexandre Ghiti wrote:
> Commit 0bcac06f27d7 ("mm, swap: skip swapcache for swapin of synchronous
> device") made SWP_SYNCHRONOUS_IO devices (e.g. zram) skip swap readahead.
>
> zswap is the same kind of in-memory, synchronous backend as zram, not a
> swap device flagged SWP_SYNCHRONOUS_IO so it still goes through
> swapin_readahead().
>
> Here are the results from bypassing readahead for zswap too: it was
> measured with a kernel build (make -j16) in a memcg, zswap=zstd, shrinker
> off, on Sapphire Rapids and 3 iterations.
>
> 768M memcg (sustained swap thrash):
> metric mm-new + bypass delta
> build time (s) 405.0 341.7 -15.6%
> zswap-in (GB) 79.5 53.0 -33%
> zswap-out (GB) 144.8 115.6 -20%
> swap readahead (pages) 6.79M 0.45M -93%
> swap_ra hit (%) 72.1 89.9 +18pp
>
> 1G memcg (light pressure, build not memory-bound):
> metric mm-new + bypass delta
> build time (s) 177.7 176.0 ~same (no regression)
> zswap-in (GB) 10.2 7.5 -26%
> zswap-out (GB) 27.7 25.1 -9%
> swap readahead (pages) 1.07M 0.08M -93%
> swap_ra hit (%) 68.6 87.2 +19pp
>
> The gain is from no longer prefetching pages that are pointless for an
> in-memory backend: readahead inflates anon residency and thrashes the
> page cache (file pages get evicted and re-read), lengthens each fault by
> synchronously (de)compressing a cluster of neighbours, and adds
> compression traffic when those extra pages are reclaimed.
>
> Bypassing swap readahead for zswap therefore makes sense.
>
> Signed-off-by: Alexandre Ghiti <alex@xxxxxxxx>
> ---

[...]

> #endif /* _LINUX_ZSWAP_H */
> diff --git a/mm/memory.c b/mm/memory.c
> index ff338c2abe92..5aa1ea9eb48a 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4827,8 +4827,9 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
> if (folio)
> swap_update_readahead(folio, vma, vmf->address);
> if (!folio) {
> - /* Swapin bypasses readahead for SWP_SYNCHRONOUS_IO devices */
> - if (data_race(si->flags & SWP_SYNCHRONOUS_IO))
> + /* Swapin bypasses readahead for SWP_SYNCHRONOUS_IO devices and zswap */
> + if (data_race(si->flags & SWP_SYNCHRONOUS_IO) ||
> + zswap_present_test(entry))

This should really be abstracted into a reasonably-named helper that can live in
swap code.

--
Cheers,

David