Re: [RFC PATCH] mm: bypass swap readahead for zswap

From: Nhat Pham

Date: Thu Jun 25 2026 - 12:40:00 EST

On Wed, Jun 24, 2026 at 12:24 PM Barry Song <baohua@xxxxxxxxxx> wrote:
>
> Basically, I have been seeing the same issue recently. If the
> readahead swap entries are also in zswap, we end up doing the
> decompression during one page fault, but then need another page fault
> to fetch the page from the swap cache and install the mapping. In that
> case, readahead may not be beneficial.

You're playing with zswap? Nice :)

It does feel like we're leaving some obvious wins on the table with
zswap here, so any ideas are much appreciated :)

>
> On the other hand, if the readahead swap entries are not in zswap, the
> situation is different.
>
> For example, suppose we fault on the swap entry for address 1 MB and
> readahead brings in the entry for 1 MB + 4 KB. If both entries are in
> zswap, readahead does not seem like a good trade-off. However, if the
> 1 MB + 4 KB entry is not in zswap and would otherwise require storage
> I/O, then readahead can be beneficial.

Yeah I can see the edge case you and Yosry brought up here. As we move
towards supporting multiple swap backends in the same system, that can
certainly happen. I'm hoping that locality will bail us out - pages
close together in virtual address space hopefully have similar
lifetime, access temperature, compressibility, and will go to the same
swap backend, etc. - but who knows :)

What if we still follow the readahead logic, but at each slot in the
readahead window, we check if the entry is owned by zswap or is of a
sync-io devices, in which case we skip the readahead...?

That way, we will still submit the IO work, but spare the ones that
require synchronous decompression work, which might affect latency...?

>
> So I implemented a rather ugly fault_around-like mechanism in
> do_swap_page(). At least with page-cluster == 1, I am seeing a
> performance improvement, as the readahead folios can be mapped
> directly and do not require a second page fault.

Hmm so what your proposal buys us is, if we're already doing readahead
(synchronously), might as well install the pages in the window into
the page tables? :)

Could you explain to me how does this improve performance? If the
pages are needed, we are transferring some work from future page
faults to the current page fault. If the pages are not needed, then we
are just wasting cycles right?

>
> It is admittedly quite ugly and is only meant as a proof of concept :-)

Dw, arguably it's easier to reason about with everything open-coded ;)