Re: [swap tier discussion] Re: [PATCH v3 2/4] mm/zswap: Implement proactive writeback

From: YoungJun Park

Date: Sun Jun 14 2026 - 05:23:44 EST


....
> >Based on the memcg interface currently proposed in swap_tier
> > (memory.swap.tiers, memory.swap.tiers.effective), I think it aligns well
> > with the current direction. It provides a foundation for selectively
> > targeting devices in tier order.
>
> Here instead of cpuset like interface, we may want more zswap like interface
> where you can put limit on the usage i.e. memory.swap.tier*.max. We can start
> with allowing only two values i.e. 0 and max which effectively will be the
> same as what you need.
>

Good idea, and it's certainly feasible. When I considered this a while
ago, the reasons I didn't take this direction were:

1. There's no real-world usage for adjusting the swap tier amount (it's
either 0 or MAX). That said, your suggestion to initially allow only
0 and max is the killing point, and it's making me reconsider.

2. The implementation cost seems high. The current implementation
handles this at runtime via simple masking.

3. Relationship with swap.max:
- If we tie it to the current interface, wouldn't limiting the swap
amount within a selected tier already be possible? I wonder if
that alone is enough.
- If we add tier.max, it would need to be a subset of swap.max.
(Any other complexities here?)

4. vswap enable/disable: vswap doesn't seem to have an amount-control
aspect, so an on/off semantic would be clearer.
https://lore.kernel.org/linux-mm/ai5kOOmR1LPTWs1J@yjaykim-PowerEdge-T330/T/#m8831ec057bf9387978d3bd698f51920600e09a04

In that case, the internal logic could stay roughly the same rather
than counting via a page counter. Something like:

1. Change the interface shell: tier.*.max — allow only 0 ~ max.
2. Keep the internal logic as is: 0 disables the mask (child memcgs
off too), max enables it (child memcgs on too).
3. memory.zswap.max integrates naturally (it's memory."tier_name".max).
4. Extend later if use cases arise.

On balance I still lean toward the current interface, but if a per-tier
max is the better fit for memcg's direction and others feel the same,
I'm happy to switch. I'd like to hear Shakeel's thoughts again, and I'm
curious about others' opinions too.

A few more perspectives on the points below.

> I will respond to your other points later when I have time.

> >
> > To summarize the discussions so far, the following points align well.
> >
> > - Per-cgroup swap control, as I suggested.
> > - Proactive zswap writeback (Hao's usecase)
> > - Swap device target demotion(if it wants selective, then it is more better), as you mentioned:
> > https://lore.kernel.org/linux-mm/aicZ-5GX9De3MAU7@xxxxxxxxx/
> > - Virtual Swap on/off in the future, as Nhat mentioned:
> > https://lore.kernel.org/linux-mm/20260528212955.1912856-1-nphamcs@xxxxxxxxx/
> > - The memory.zswap.writeback alternative (no hierarchy model conflict)
> > - zswap is first swap tier.
> > - Promotion. (Also better for selectve usage)
> > - tier based swap policy (e.g round-robin...)
> >
> > To accelerate this work, I believe we should reach a consensus and
> > merge the currently proposed swap_tier interface :)
> >
> > If the above approach is difficult, I would like to suggest an
> > alternative for progress with the memcg interfaces removed:
> >
> > 1) We could make zswap the first tier and create
> > a use case where memory.zswap.writeback internally is handled by tier logic.
> >
> > 2) Or simply merge the swap_tier infrastructure itself first.
> >
> > This would allow the swap_tier infrastructure to be merged and discussed
> > more easily.
> >
> > If it takes longer to adopt swap_tier anyway, by doing so we progress next step
> > as a experimental feature.
> >
> > - Apply per-cgroup swap as an experimental (debugfs) feature.
> > - Apply Hao's use case experimentally or as it is as Yosry suggested.
> > (future migration to swap tier)
> >
> > How do you think?
> >
> > (FYI: My emails to kernel.org are failing due to internal server issues.)
> >
> > Thank you
> > Youngjun Park

Let me clarify a part I wrote confusingly. Handling
memory.zswap.writeback via tiers is possible, but I don't think the
interface itself would be replaced even if memory.swap.tiers is adopted.

Selecting only zswap in memory.swap.tiers would not just disable
writeback.it would also block regular swap entirely, which differs
slightly from the current semantic. (... "Per the cgroup v2 docs: a
zswap-only tier setting is subtly different from setting
memory.swap.max to 0, since it still allows pages to be written to the
zswap pool; this has no effect if zswap is disabled, and swapping is
allowed unless memory.swap.max is set to 0.")

So the interface itself needs to be retained, and it could be extended
toward selective writeback — e.g., passing a desired tier into
memory.zswap.writeback so writeback targets only that tier. Currently
it only controls on/off. Other tiers probably don't need this. demotion
based on the selected tier should be enough.

Thanks,
Youngjun Park