Re: [PATCH v8 0/4] mm/swap, memcg: Introduce swap tiers for cgroup based swap control
From: Nhat Pham
Date: Wed Jun 17 2026 - 13:52:34 EST
On Wed, Jun 17, 2026 at 1:34 AM Youngjun Park <youngjun.park@xxxxxxx> wrote:
>
> This is the v8 series of the swap tier patchset.
>
> Great thanks to Shakeel Butt and Yosry for the reviews and discussions [1].
> The main change in this version is the interface change to use
> memory.swap.tiers.max with '0' (disable) and 'max' (enable) values.
> This mechanism was suggested by Shakeel and Yosry
I like this interface too :)
>
> This change allows for future extensions to control swap
> between tiers and aligns better with existing memcg interfaces.
> Even with this memcg interface change, only patch #3 needed updates.
> Internally, patch #3 still uses the existing mask processing method
> (which is implementation-efficient), so only the user-facing interface
> was modified.
>
> We also discussed tier extensions. Thanks to Yosry, Nhat and Shakeel for their
> valuable feedback.
>
> Here is a brief summary of our tentative conclusions. Please correct me
> if anything is misrepresented (details in references):
>
> * Zswap tiering [2]:
> Tiering applies only to the vswap + zswap combo. Zswap itself will
> not be tiered, as the current architecture requires a physical device
> for zswap allocation.
I think Yosry wants zswap as a tier, right?
Just that without vswap, maybe don't allow it to be an tier of itself?
> * Vswap tiering [3]:
> Vswap should be handled transparently to the user. Vswap itself will
> not be tiered. But, someday supported if there is strong and real usecase.
> * Relationship with zswap.writeback [4]:
> If zswap tiering is introduced, it could replace the zswap-only tier.
> However, since zswap cannot be tiered independently, it is still
> needed for non-vswap cases. Separately, the internal logic could
> potentially be integrated into the tiering logic.
> * Tier demotion [5]:
> A separate interface like memory.swap.tiers.demotion might be needed.
> For now, we only support 0/max to enable/disable tiers. In the future,
> we could introduce an "auto" mode to automatically scale the limit
> based on swapfile size and memory.swap.max, similar to the direction
> memory tiering is heading in.
>
> I plan to apply the swap tier infrastructure and the first use case
> (cgroup-based swap control) first, and continue following up on the
> discussions above.
>
> Overview
> ========
>
> Swap Tiers group swap devices into performance classes (e.g. NVMe,
> HDD, Network) and allow per-memcg selection of which tiers to use.
> This mechanism was suggested by Chris Li.
>
>
> #2: Inter-tier promotion and demotion:
> Promotion and demotion apply between tiers, not within a single
> tier. The current interface defines only tier assignment; it does
> not yet define when or how pages move between tiers. Two triggering
> models are possible:
>
> (a) User-triggered: userspace explicitly initiates migration between
> tiers (e.g. via a new interface or existing move_pages semantics).
> (b) Kernel-triggered: the kernel moves pages between tiers at
> appropriate points such as reclaim or refault.
We'll likely need some kernel-triggered mechanism, or we'd have LRU inversion :)
Cold pages will fill up fast tiers first, and more recent/warm pages
will land on slow tiers...
We'll also need to enforce isolation/fairness to make sure no wordload
hoard the fast tiers too (but that probably requires demotion
support).
>
> #3: Per-VMA, per-process swap and BPF:
> Not just for memcg based swap, possible to extend Per-VMA or per-process
> swap. Or we can use it as BPF program.
>
> #4: Zswap and vswap tiering:
> Tiering applies to the vswap + zswap combination.
>
> #5: Vswap on/off control:
> Currently not supported. If a strong use case arises where vswap needs
> to be controlled by memcg, the tier interface could be used for it.
+1.
Also, per-si/per-tier per-CPU allocation caching? :) Kairui already
has a patch for it, IIUC, but if not it's pretty critical I'd say.
BTW, can we add some selftests, to make sure the new interface works
as expected, and to have example programs for new users to model their
scripts after? :)