Re: [PATCH v3 00/20] Virtual Swap Space

From: Chris Li

Date: Tue Feb 10 2026 - 16:24:28 EST


Hi Johannes,

On Mon, Feb 9, 2026 at 6:36 PM Johannes Weiner <hannes@xxxxxxxxxxx> wrote:
>
> Hi Chris,
>
> On Mon, Feb 09, 2026 at 04:20:21AM -0800, Chris Li wrote:
> > Is the per swap slot entry overhead 24 bytes in your implementation?
> > The current swap overhead is 3 static +8 dynamic, your 24 dynamic is a
> > big jump. You can argue that 8->24 is not a big jump . But it is an
> > unnecessary price compared to the alternatives, which is 8 dynamic +
> > 4(optional redirect).
>
> No, this is not the net overhead.

I am talking about the total metadata overhead per swap entry. Not net.

> The descriptor consolidates and eliminates several other data
> structures.

Adding members previously not there and making some members bigger
along the way. For example, the swap_map from 1 byte to a 4 byte
count.

>
> Here is the more detailed breakdown:

It seems you did not finish your sentence before sending your reply.

Anyway, I saw the total per swap entry overhead bump to 24 bytes
dynamic. Let me know what is the correct number for VS if you
disagree.

Chris

> > > The size of the virtual swap descriptor is 24 bytes. Note that this is
> > > not all "new" overhead, as the swap descriptor will replace:
> > > * the swap_cgroup arrays (one per swap type) in the old design, which
> > > is a massive source of static memory overhead. With the new design,
> > > it is only allocated for used clusters.
> > > * the swap tables, which holds the swap cache and workingset shadows.
> > > * the zeromap bitmap, which is a bitmap of physical swap slots to
> > > indicate whether the swapped out page is zero-filled or not.
> > > * huge chunk of the swap_map. The swap_map is now replaced by 2 bitmaps,
> > > one for allocated slots, and one for bad slots, representing 3 possible
> > > states of a slot on the swapfile: allocated, free, and bad.
> > > * the zswap tree.
> > >
> > > So, in terms of additional memory overhead:
> > > * For zswap entries, the added memory overhead is rather minimal. The
> > > new indirection pointer neatly replaces the existing zswap tree.
> > > We really only incur less than one word of overhead for swap count
> > > blow up (since we no longer use swap continuation) and the swap type.
> > > * For physical swap entries, the new design will impose fewer than 3 words
> > > memory overhead. However, as noted above this overhead is only for
> > > actively used swap entries, whereas in the current design the overhead is
> > > static (including the swap cgroup array for example).
> > >
> > > The primary victim of this overhead will be zram users. However, as
> > > zswap now no longer takes up disk space, zram users can consider
> > > switching to zswap (which, as a bonus, has a lot of useful features
> > > out of the box, such as cgroup tracking, dynamic zswap pool sizing,
> > > LRU-ordering writeback, etc.).
> > >
> > > For a more concrete example, suppose we have a 32 GB swapfile (i.e.
> > > 8,388,608 swap entries), and we use zswap.
> > >
> > > 0% usage, or 0 entries: 0.00 MB
> > > * Old design total overhead: 25.00 MB
> > > * Vswap total overhead: 0.00 MB
> > >
> > > 25% usage, or 2,097,152 entries:
> > > * Old design total overhead: 57.00 MB
> > > * Vswap total overhead: 48.25 MB
> > >
> > > 50% usage, or 4,194,304 entries:
> > > * Old design total overhead: 89.00 MB
> > > * Vswap total overhead: 96.50 MB
> > >
> > > 75% usage, or 6,291,456 entries:
> > > * Old design total overhead: 121.00 MB
> > > * Vswap total overhead: 144.75 MB
> > >
> > > 100% usage, or 8,388,608 entries:
> > > * Old design total overhead: 153.00 MB
> > > * Vswap total overhead: 193.00 MB
> > >
> > > So even in the worst case scenario for virtual swap, i.e when we
> > > somehow have an oracle to correctly size the swapfile for zswap
> > > pool to 32 GB, the added overhead is only 40 MB, which is a mere
> > > 0.12% of the total swapfile :)
> > >
> > > In practice, the overhead will be closer to the 50-75% usage case, as
> > > systems tend to leave swap headroom for pathological events or sudden
> > > spikes in memory requirements. The added overhead in these cases are
> > > practically neglible. And in deployments where swapfiles for zswap
> > > are previously sparsely used, switching over to virtual swap will
> > > actually reduce memory overhead.
> > >
> > > Doing the same math for the disk swap, which is the worst case for
> > > virtual swap in terms of swap backends:
> > >
> > > 0% usage, or 0 entries: 0.00 MB
> > > * Old design total overhead: 25.00 MB
> > > * Vswap total overhead: 2.00 MB
> > >
> > > 25% usage, or 2,097,152 entries:
> > > * Old design total overhead: 41.00 MB
> > > * Vswap total overhead: 66.25 MB
> > >
> > > 50% usage, or 4,194,304 entries:
> > > * Old design total overhead: 57.00 MB
> > > * Vswap total overhead: 130.50 MB
> > >
> > > 75% usage, or 6,291,456 entries:
> > > * Old design total overhead: 73.00 MB
> > > * Vswap total overhead: 194.75 MB
> > >
> > > 100% usage, or 8,388,608 entries:
> > > * Old design total overhead: 89.00 MB
> > > * Vswap total overhead: 259.00 MB
> > >
> > > The added overhead is 170MB, which is 0.5% of the total swapfile size,
> > > again in the worst case when we have a sizing oracle.