Re: [PATCH RFC 00/15] mm, swap: swap table phase IV with dynamic ghost swapfile
From: Johannes Weiner
Date: Mon Feb 23 2026 - 11:57:57 EST
On Fri, Feb 20, 2026 at 07:42:01AM +0800, Kairui Song via B4 Relay wrote:
> - 8 bytes per slot memory usage, when using only plain swap.
> - And the memory usage can be reduced to 3 or only 1 byte.
> - 16 bytes per slot memory usage, when using ghost / virtual zswap.
> - Zswap can just use ci_dyn->virtual_table to free up it's content
> completely.
> - And the memory usage can be reduced to 11 or 8 bytes using the same
> code above.
> - 24 bytes only if including reverse mapping is in use.
That seems to tie us pretty permanently to duplicate metadata.
For every page that was written to disk through zswap, we have an
entry in the ghost swapfile, and an entry in the backend swapfile, no?
> - Minimal code review or maintenance burden. All layers are using the exact
> same infrastructure for metadata / allocation / synchronization, making
> all API and conventions consistent and easy to maintain.
> - Writeback, migration and compaction are easily supportable since both
> reverse mapping and reallocation are prepared. We just need a
> folio_realloc_swap to allocate new entries for the existing entry, and
> fill the swap table with a reserve map entry.
> - Fast swapoff: Just read into ghost / virtual swap cache.
Can we get this for disk swap as well? ;)
Zswap swapoff is already fairly fast, albeit CPU intense. It's the
scattered IO that makes swapoff on disks so terrible.
> The size of the swapfile (si->max) is now just a number, which could be
> changeable at runtime if we have a proper idea how to expose that and
> might need some audit of a few remaining users. But right now, we can
> already easily have a huge swap device with no overhead, for example:
>
> free -m
> total used free shared buff/cache available
> Mem: 1465 250 927 1 356 1215
> Swap: 15269887 0 15269887
I'm not a fan of this. This makes free(1) output kind of useless, and
very misleading. The swap space presented here has nothing to do with
actual swap capacity, and the actual disk swap capacity is obscured.
And how would a user choose this size? How would a distribution?
The only limit is compression ratio, and you don't know this in
advance. This restriction seems pretty arbitrary and avoidable.
There is no good technical reason to present this in any sort of
static fashion.