Re: [RFC PATCH 00/14] Virtual Swap Space
From: Kairui Song
Date: Tue Apr 08 2025 - 12:59:56 EST
On Wed, Apr 9, 2025 at 12:48 AM Nhat Pham <nphamcs@xxxxxxxxx> wrote:
>
> On Tue, Apr 8, 2025 at 9:23 AM Kairui Song <ryncsn@xxxxxxxxx> wrote:
> >
> >
> > Thanks for sharing the code, my initial idea after the discussion at
> > LSFMM is that there is a simple way to combine this with the "swap
> > table" [1] design of mine to solve the performance issue of this
> > series: just store the pointer of this struct in the swap table. It's
> > a bruteforce and glue like solution but the contention issue will be
> > gone.
>
> Was waiting for your submission, but I figured I should send what I
> had out first for immediate feedback :)
>
> Johannes actually proposed something similar to your physical swap
> allocator for the virtual swap slots allocation logic, to solve our
> lock contention problem. My apologies - I should have name-dropped you
> in the RFC cover as well (the cover was a bit outdated, and I haven't
> updated the newest developments that came from the LSFMMBPF
> conversation in the cover letter).
>
> >
> > Of course it's not a good approach, ideally the data structure can be
> > simplified to an entry type in the swap table. The swap table series
> > handles locking and synchronizations using either cluster lock
> > (reusing swap allocator and existing swap logics) or folio lock (kind
> > of like page cache). So many parts can be much simplified, I think it
> > will be at most ~32 bytes per page with a virtual device (including
> > the intermediate pointers).Will require quite some work though.
> >
> > The good side with that approach is we will have a much lower memory
> > overhead and even better performance. And the virtual space part will
> > be optional, for non virtual setup the memory consumption will be only
> > 8 bytes per page and also dynamically allocated, as discussed at
> > LSFMM.
>
> I think one problem with your design, which I alluded to at the
> conference, is that it doesn't quite work for our requirements -
> namely the separation of zswap from its underlying backend.
>
> All the metadata HAVE to live at the virtual layer. For once, we are
> duplicating the logic if we push this to the backend.
>
> But more than that, there are lifetime operations that HAVE to be
> backend-agnostic. For instance, on the swap out path, when we unmap
> the page from the page table, we do swap_duplicate() (i.,e increasing
> the swap count/reference count of the swap entries). At that point, we
> have not (and cannot) make a decision regarding the backend storage
> yet, and thus does not have any backend-specific places to hold this
> piece of information. If we couple all the backends then yeah sure we
> can store it at the physical swapfile level, but that defeats the
> purpose of swap virtualization :)
Ah, now I get why you have to store the data in the virtual layer.
I was thinking that doing it in the physical layer will make it easier
to reuse what swap already has. But if you need to be completely
backend-agnostic, then just keep it in the virtual layer. Seems not a
foundunmentail issue, it could be worked out in some way I think. eg.
using another table type. I'll check if that would work after I've
done the initial parts.
>
> >
> > So sorry that I still have a few parts undone, looking forward to
> > posting in about one week, eg. After this weekend it goes well. I'll
> > also try to check your series first to see how these can be
> > collaborated better.
>
> Of course, I'm not against collaboration :) As I mentioned earlier, we
> need more work on the allocation part, which your physical swapfile
> allocator should either work, or serve as the inspiration for.
>
> Cheers,
> Nhat