Re: [swap tier discussion] Re: [PATCH v3 2/4] mm/zswap: Implement proactive writeback

From: Yosry Ahmed

Date: Mon Jun 15 2026 - 15:55:32 EST


> In that case, the internal logic could stay roughly the same rather
> than counting via a page counter. Something like:
>
> 1. Change the interface shell: tier.*.max — allow only 0 ~ max.

What about a single interface as I suggested to remain consistent with
memory tiering?

> 2. Keep the internal logic as is: 0 disables the mask (child memcgs
> off too), max enables it (child memcgs on too).

I think a child should be able to disable a swap tier enabled by the
parent, but not vice versa.

> 3. memory.zswap.max integrates naturally (it's memory."tier_name".max).

Not really. memory.zswap.max is in terms of memory usage (compressed
size), not swap usage (uncompressed size).

[..]
> Let me clarify a part I wrote confusingly. Handling
> memory.zswap.writeback via tiers is possible, but I don't think the
> interface itself would be replaced even if memory.swap.tiers is adopted.
>
> Selecting only zswap in memory.swap.tiers would not just disable
> writeback.it would also block regular swap entirely, which differs
> slightly from the current semantic. (... "Per the cgroup v2 docs: a
> zswap-only tier setting is subtly different from setting
> memory.swap.max to 0, since it still allows pages to be written to the
> zswap pool; this has no effect if zswap is disabled, and swapping is
> allowed unless memory.swap.max is set to 0.")

I don't understand. How is disabling zswap writeback not equivalent to
only enabling zswap as a tier?

Do you just mean the fact that disabling zswap writeback is a noop of
zswap is disabled? It's a different interface so I think a small
semantic difference is okay. In practice, I doubt that zswap is being
disabled at runtime.

>
> So the interface itself needs to be retained, and it could be extended
> toward selective writeback — e.g., passing a desired tier into
> memory.zswap.writeback so writeback targets only that tier. Currently
> it only controls on/off. Other tiers probably don't need this. demotion
> based on the selected tier should be enough.
>
> Thanks,
> Youngjun Park
>

On Sun, Jun 14, 2026 at 2:23 AM YoungJun Park <youngjun.park@xxxxxxx> wrote:
>
> ....
> > >Based on the memcg interface currently proposed in swap_tier
> > > (memory.swap.tiers, memory.swap.tiers.effective), I think it aligns well
> > > with the current direction. It provides a foundation for selectively
> > > targeting devices in tier order.
> >
> > Here instead of cpuset like interface, we may want more zswap like interface
> > where you can put limit on the usage i.e. memory.swap.tier*.max. We can start
> > with allowing only two values i.e. 0 and max which effectively will be the
> > same as what you need.
> >
>
> Good idea, and it's certainly feasible. When I considered this a while
> ago, the reasons I didn't take this direction were:
>
> 1. There's no real-world usage for adjusting the swap tier amount (it's
> either 0 or MAX). That said, your suggestion to initially allow only
> 0 and max is the killing point, and it's making me reconsider.
>
> 2. The implementation cost seems high. The current implementation
> handles this at runtime via simple masking.
>
> 3. Relationship with swap.max:
> - If we tie it to the current interface, wouldn't limiting the swap
> amount within a selected tier already be possible? I wonder if
> that alone is enough.
> - If we add tier.max, it would need to be a subset of swap.max.
> (Any other complexities here?)
>
> 4. vswap enable/disable: vswap doesn't seem to have an amount-control
> aspect, so an on/off semantic would be clearer.
> https://lore.kernel.org/linux-mm/ai5kOOmR1LPTWs1J@yjaykim-PowerEdge-T330/T/#m8831ec057bf9387978d3bd698f51920600e09a04
>
> In that case, the internal logic could stay roughly the same rather
> than counting via a page counter. Something like:
>
> 1. Change the interface shell: tier.*.max — allow only 0 ~ max.
> 2. Keep the internal logic as is: 0 disables the mask (child memcgs
> off too), max enables it (child memcgs on too).
> 3. memory.zswap.max integrates naturally (it's memory."tier_name".max).
> 4. Extend later if use cases arise.
>
> On balance I still lean toward the current interface, but if a per-tier
> max is the better fit for memcg's direction and others feel the same,
> I'm happy to switch. I'd like to hear Shakeel's thoughts again, and I'm
> curious about others' opinions too.
>
> A few more perspectives on the points below.
>
> > I will respond to your other points later when I have time.
>
> > >
> > > To summarize the discussions so far, the following points align well.
> > >
> > > - Per-cgroup swap control, as I suggested.
> > > - Proactive zswap writeback (Hao's usecase)
> > > - Swap device target demotion(if it wants selective, then it is more better), as you mentioned:
> > > https://lore.kernel.org/linux-mm/aicZ-5GX9De3MAU7@xxxxxxxxx/
> > > - Virtual Swap on/off in the future, as Nhat mentioned:
> > > https://lore.kernel.org/linux-mm/20260528212955.1912856-1-nphamcs@xxxxxxxxx/
> > > - The memory.zswap.writeback alternative (no hierarchy model conflict)
> > > - zswap is first swap tier.
> > > - Promotion. (Also better for selectve usage)
> > > - tier based swap policy (e.g round-robin...)
> > >
> > > To accelerate this work, I believe we should reach a consensus and
> > > merge the currently proposed swap_tier interface :)
> > >
> > > If the above approach is difficult, I would like to suggest an
> > > alternative for progress with the memcg interfaces removed:
> > >
> > > 1) We could make zswap the first tier and create
> > > a use case where memory.zswap.writeback internally is handled by tier logic.
> > >
> > > 2) Or simply merge the swap_tier infrastructure itself first.
> > >
> > > This would allow the swap_tier infrastructure to be merged and discussed
> > > more easily.
> > >
> > > If it takes longer to adopt swap_tier anyway, by doing so we progress next step
> > > as a experimental feature.
> > >
> > > - Apply per-cgroup swap as an experimental (debugfs) feature.
> > > - Apply Hao's use case experimentally or as it is as Yosry suggested.
> > > (future migration to swap tier)
> > >
> > > How do you think?
> > >
> > > (FYI: My emails to kernel.org are failing due to internal server issues.)
> > >
> > > Thank you
> > > Youngjun Park
>
> Let me clarify a part I wrote confusingly. Handling
> memory.zswap.writeback via tiers is possible, but I don't think the
> interface itself would be replaced even if memory.swap.tiers is adopted.
>
> Selecting only zswap in memory.swap.tiers would not just disable
> writeback.it would also block regular swap entirely, which differs
> slightly from the current semantic. (... "Per the cgroup v2 docs: a
> zswap-only tier setting is subtly different from setting
> memory.swap.max to 0, since it still allows pages to be written to the
> zswap pool; this has no effect if zswap is disabled, and swapping is
> allowed unless memory.swap.max is set to 0.")
>
> So the interface itself needs to be retained, and it could be extended
> toward selective writeback — e.g., passing a desired tier into
> memory.zswap.writeback so writeback targets only that tier. Currently
> it only controls on/off. Other tiers probably don't need this. demotion
> based on the selected tier should be enough.
>
> Thanks,
> Youngjun Park
>