Re: [PATCH v8 0/4] mm/swap, memcg: Introduce swap tiers for cgroup based swap control

From: YoungJun Park

Date: Wed Jun 17 2026 - 21:48:08 EST


On Wed, Jun 17, 2026 at 01:50:49PM -0400, Nhat Pham wrote:

> On Wed, Jun 17, 2026 at 1:34 AM Youngjun Park <youngjun.park@xxxxxxx> wrote:
> >
> > This is the v8 series of the swap tier patchset.
> >
> > Great thanks to Shakeel Butt and Yosry for the reviews and discussions [1].
> > The main change in this version is the interface change to use
> > memory.swap.tiers.max with '0' (disable) and 'max' (enable) values.
> > This mechanism was suggested by Shakeel and Yosry
>
> I like this interface too :)

Good to hear. Now it looks like we have found a memcg interface that
aligns well with the existing memcg model.

I like this idea as well. Thanks again to Shakeel Butt and Yosry.

> > Here is a brief summary of our tentative conclusions. Please correct me
> > if anything is misrepresented (details in references):
> >
> > * Zswap tiering [2]:
> > Tiering applies only to the vswap + zswap combo. Zswap itself will
> > not be tiered, as the current architecture requires a physical device
> > for zswap allocation.
>
> I think Yosry wants zswap as a tier, right?
>
> Just that without vswap, maybe don't allow it to be an tier of itself?

With the current architecture, users cannot dynamically specify zswap as
a tier, and zswap is a separate layer, so it is not tiered by itself.

Once your vswap work lands, I think we can make the zswap
become the default, top-level tier.

After that, we can also look into cleaning up the zswap.writeback
interface together.

> #2: Inter-tier promotion and demotion:
> Promotion and demotion apply between tiers, not within a single
> tier. The current interface defines only tier assignment; it does
> not yet define when or how pages move between tiers. Two triggering
> models are possible:
>
> > (a) User-triggered: userspace explicitly initiates migration between
> > tiers (e.g. via a new interface or existing move_pages semantics).
> > (b) Kernel-triggered: the kernel moves pages between tiers at
> > appropriate points such as reclaim or refault.
>
> We'll likely need some kernel-triggered mechanism, or we'd have LRU inversion :)
>
> Cold pages will fill up fast tiers first, and more recent/warm pages
> will land on slow tiers...

Yeah, good point!

> We'll also need to enforce isolation/fairness to make sure no wordload
> hoard the fast tiers too (but that probably requires demotion
> support).

Right, that makes sense.

BTW, One thing I am curious about, though, is whether there are strong
real-world use cases that require demotion/promotion.
Theoretically, this looks useful but it would be helpful to better understand
the requirements from such deployments.

> >
> > #3: Per-VMA, per-process swap and BPF:
> > Not just for memcg based swap, possible to extend Per-VMA or per-process
> > swap. Or we can use it as BPF program.
> >
> > #4: Zswap and vswap tiering:
> > Tiering applies to the vswap + zswap combination.
> >
> > #5: Vswap on/off control:
> > Currently not supported. If a strong use case arises where vswap needs
> > to be controlled by memcg, the tier interface could be used for it.
>
> +1.
>
> Also, per-si/per-tier per-CPU allocation caching? :) Kairui already
> has a patch for it, IIUC, but if not it's pretty critical I'd say.

Yes, I missed it. Thank you for addressing it.
we need an implementation that integrates this with the per-CPU
allocation currently implemented on the vswap side.

If Kairui's patch lands, my patch #4 also can be optimized based on that.

> BTW, can we add some selftests, to make sure the new interface works
> as expected, and to have example programs for new users to model their
> scripts after? :)

Yes, I agree. I think selftests are necessary.

Do you want them to be introduced in this patchset, or would it be okay
to add them separately as follow-up work?