Re: [PATCH v5 00/21] Virtual Swap Space
From: Kairui Song
Date: Thu Apr 23 2026 - 02:21:35 EST
On Thu, Apr 23, 2026 at 4:27 AM Yosry Ahmed <yosry@xxxxxxxxxx> wrote:
>
> On Wed, Apr 22, 2026 at 10:18:35AM +0800, Kairui Song wrote:
> > On Wed, Apr 22, 2026 at 8:26 AM Yosry Ahmed <yosry@xxxxxxxxxx> wrote:
> > >
> > > On Fri, Mar 20, 2026 at 12:27:14PM -0700, Nhat Pham wrote:
> > > >
> > > > This patch series implements the virtual swap space idea, based on Yosry's
> > > > proposals at LSFMMBPF 2023 (see [1], [2], [3]), as well as valuable
> > > > inputs from Johannes Weiner. The same idea (with different
> > > > implementation details) has been floated by Rik van Riel since at least
> > > > 2011 (see [8]).
> > >
> > > Unfortuantely, I haven't been able to keep up with virtual swap and swap
> > > table development, as my time is mostly being spent elsewhere these
> > > days. I do have a question tho, which might have already been answered
> > > or is too naive/stupid -- so apologies in advance.
> >
> > Hi Yosry,
> >
> > Not a stupid question at all—it's actually spot on. :)
> >
> > >
> > > Given the recent advancements in the swap table and that most metadata
> > > and the swap cache are already being pulled into it, is it possible to
> > > use the swap table in the virtual swap layer instead of the xarray?
> > >
> > > Basically pull the swap table one layer higher, and have it point to
> > > either a zswap entry or a physical swap slot (or others in the future)?
> > > If my understanding is correct, we kinda get the best of both worlds and
> > > reuse the integration already done by the swap table with the swap
> > > cache, as well as the lock paritioning.
> > >
> > > In this world, the clusters would be in the virtual swap space, and we'd
> > > create the clusters on-demand as needed.
> > >
> > > Does this even work or make the least amount of sense (I guess the
> > > question is for both Nhat and Kairui)?
> > >
> >
> > Yes, this absolutely works. In fact, I previously posted a working RFC
> > based on this idea. In that series, clusters are dynamically
> > allocated, allowing the swap space to be dynamically sized
> > (essentially infinite) while reusing all the existing infrastructure:
> > https://lore.kernel.org/all/20260220-swap-table-p4-v1-0-104795d19815@xxxxxxxxxxx/
>
> There are a few aspects that I don't agree with in this RFC, and I think
> Nhat and Johannes raised most of them. Mostly that I don't want to
> expose ghost swapfiles or similar to userspace.
>
> I think userspace's view of swapfiles should remain the same and reflect
> the physical swap slots. The virtual swap layer should be completely
> transparent in this case. Userspace shouldn't need to configure it in
> any way.
That approach is definitely doable. For example, with that RFC we
could simply drop the interface I introduced and enable it via a
different knob, and that would be very close to it. :)
Using a swapfile to represent the virtual layer externally just made
it more flexible. I agree that the RFC design was a bit confusing and
could be improved. There is no technical difficulty in hiding it from
userspace; it's mostly a design choice. And even if we don't use a
swapfile to represent it internally, all the other infrastructure can
still be reused without much modification.
Using a swapfile does have its benefits, though. For example, the
virtual layer could act as an ordinary tier following YoungJun's
design:
https://lore.kernel.org/linux-mm/20260421055323.940344-1-youngjun.park@xxxxxxx/
It also means we wouldn't need to introduce things like a new,
virtual-specific swapoff mechanism.
> In an ideal world, the only noticeable change from userspace is that
> with zswap, compressed pages would stop using slots in the swapfile and
> charging the memcg for them -- and that zswap would work even without a
> swapfile, by just enabling it. This is admittedly a user-visible
> behavioral change, but I am hoping that's a good one that we can live
> with.
Totally agree with the ideal end goal for zswap. just not sure if
that's the right place to start for this usage, zswap doesn't always
apply. For instance, we have SSDs with built-in compression,
software-based storage stacks with built-in compression and
deduplication, swap over RDMA, and, most notably, ZRAM users. They
don't necessarily need zswap or a virtual layer, and the upper layer
better be as much simplified as possible.
> If there are real concerns about this, we can discuss things like a knob
> or config option to keep charging zswap pages as swap slots (ew..) or
> only allow zswap with a real swapfile (double ew..). But I am really
> hoping we can get away with changing the semantics without doing this.
>
> We can add extra interfaces for virtual swap as needed, e.g. virtual
> swapoff that you mentioned to clear the swap cache, or stats about the
> virtual swap space (which translates to memory overhead).
Good suggestions.
> > It cleans up a lot of allocation and ordering, as well as memcg
> > swap lookups. Since some of these problems were also observed in the
> > vss discussion, I think this will make things easier for all of us:
> > https://lore.kernel.org/all/20260421-swap-table-p4-v3-0-2f23759a76bc@xxxxxxxxxxx/
>
> Yeah I saw that (but didn't really have time to do anything else about
> it). Splitting this out is definitely the right thing to do, and the
> series looks great from a very high level. Awesome work, as usual :)
Thanks!