Re: [PATCH v5 00/21] Virtual Swap Space

From: Nhat Pham

Date: Fri Apr 24 2026 - 13:32:18 EST


On Thu, Apr 23, 2026 at 9:16 PM Kairui Song <ryncsn@xxxxxxxxx> wrote:

My apologies for delayed response - I'm cleaning things up, and
fighting with some memsw issue. I changed the semantic of the
memory.swap counter a bit, but that makes it diverge operationally
from memsw. Need to be careful not double charging or double
uncharging here.

>
> Yosry Ahmed <yosry@xxxxxxxxxx> 于 2026年4月24日周五 04:48写道:
> > > Using a swapfile does have its benefits, though. For example, the
> > > virtual layer could act as an ordinary tier following YoungJun's
> > > design:
> > > https://lore.kernel.org/linux-mm/20260421055323.940344-1-youngjun.park@xxxxxxx/
> >
> > Hmm I didn't look too closely at this but I don't understand how
> > making it a swapfile helps with tiering? If anything, I think it makes
> > tiering more difficult. For tiering to work, we need an
> > abstraction/redirection layer, such that we don't need to update the
> > page tables (or shmem pagecache) if we demote/promote pages. That is
> > exactly the use case for a virtual swap layer. The page tables point
> > at a virtual swap ID and the backend could change transparently (e.g.
> > for zswap writeback, or tiering).
> >
> > If we make the virtual layer a swapfile, how do we demote/promote
> > without updating page tables?
> >
> > IOW, I think the whole reason we want a virtual layer is to separate
> > the backends, which would facilitate tiering. If the virtual layer is
> > itself a swapfile, wouldn't it become one of the tiers?
>
> That's exactly what I hoped, virtual layer being part of the tier.
> Tier could be set up per task / cgroup. So is the virtual tier.
>
> A standalone implementation of the virtual layer is more heavy than
> being a swapfile. Actually I think at this point, it is the word
> "swapfile" is misleading now. We may rename it to "swap mapping" or
> something. A swap mapping could be physical or virtual. Virtual
> mapping can realloc from physical ones (redirect), and swapoff of
> physical ones just read its data into virtual mapping's swap cache.
>
> I think it's actually functionally very similar to Nhat's design
> already from a high level, the only difference is we don't need
> standalone infra for virtual parts.

Well yeah, great minds think alike ;)

As you have noticed, I have also converged towards a lot of your
metadata design and operational arrangement.

Case in point is the delaying of cgroup check merging with swap
freeing - I did not notice that patch you had in your series, but I
realized I had to do it as well after studying the regression for
awhile.

(I did think about proposing that outside of the vswap series, but I
was thinking it would not be a problem at all with the current code.
But in hindsight, since you're also merging swap cgroup with swap
table, it will have a similar implications, albeit less expensive due
to no xarray indirection).

Hopefully we can iron out the rest of the differences. I have a couple
more use cases in mind (compressed writeback from zswap, discontiguous
fallback for swapout, etc.), but without virtualization they seem like
a deadend :(

And Gregory's cram stuff too - I think it's not undoable without
vswap, but it's just a lot hairier :(

>
> For swapoff or migration you don't need to touch the page table, same
> as in this series, just update the virtual swap mapping to be cached
> or update the entry, it's identical to what this series is doing.

Yeah the swapoff is no big deal.