Re: [PATCH v3 2/4] mm/zswap: Implement proactive writeback

From: YoungJun Park

Date: Tue Jun 09 2026 - 00:35:13 EST


On Mon, Jun 08, 2026 at 03:27:07PM -0700, Yosry Ahmed wrote:

+Chris +Kairui +Baoquan

Hello

Thanks for inviting me to the discussion, Shakeel.

> > > > Youngjun is working on swap tiers. At the moment he is more interested in
> > > > allowing a specific swap device to a memcg or not. I can imagine in future there
> > > > will be use-cases where there will be a need to demote data on higher tier swap
> > > > to lower tier swap. What would be the appropriate interface?

Speaking of my work on swap tiers, I recently submitted a patch and am
currently considering memcg integration:
https://lore.kernel.org/linux-mm/20260527062247.3440692-1-youngjun.park@xxxxxxx/

The future use-cases imagined above seem to align with this
direction. (BTW, I am currently waiting for reviews/feedback from the memcg
folks on this patch. Any reviews would be highly appreciated!)

We could potentially assign a target tier
for writeback within the existing memory.zswap.writeback interface.

For instance, '0' could mean disabled, while non-zero values could represent
specific tiers, which would maintain backward compatibility with the current
version. Alternatively, if zswap is treated as the default top tier,
the `memory.swap.tiers` interface could potentially replace `memory.zswap.writeback`.

Furthermore, this could be expanded so that each swap tier can demote data
user-triggered demotion between swap tiers.

Based on the current patch's ideas combined with my swap tiers concept:

Assuming a hierarchy like:
zswap -> tier1 (SSD swap) -> tier2 (HDD swap) -> tier3 (Network swap)

We could configure the active tiers via a setting like `memory.swap.tiers`
(tier2 enabled, tier3 enabled).

For example, the concept of `echo "100M zswap_writeback_only > memory.reclaim"`
could be extended. A user could run `echo "100M tier2 > memory.reclaim"`
to explicitly trigger demotion from tier2 to tier3.
(BTW, if we combine these features, my personal preference for the keyword
format would be `<size> <demote_prefix><tier_name>`. I think it would be
better to explicitly indicate that it is a swap demotion by using a specific
prefix followed by the tier name.
Or make demote prefix another key is also possible)

So, the whole picture would look something like this:
- memory.swap.tiers : Interface for configuring the tier mask.
- memory.reclaim : Entry point for user-triggered demotion.

> > > Things will probably get more
> > > blurry with memory tiers and compressed memory nodes though.
> >
> > I think there will still be distinction between byte addressable and fault on
> > access devices.
>
> Yeah, I think it makes sense to define "swap" as fault on access
> (zswap, SSD, etc), and memory tiers as byte-addressable (even if you
> put an SSD behind CXL and make it byte-addressable). But I also
> remember seeing discussions about unifying memory tiers and swap in a
> way, and it makes sense from a reclaim perspective (swap or demote
> first?).


> > > > will be use-cases where there will be a need to demote data on higher tier swap
> > > > to lower tier swap. What would be the appropriate interface?
> > > >
> > > > BTW does zswap folks think of zswap as a top swap tier or something different?
> > >
> > > I haven't been following the swap tiers work closely, but personally I
> > > do think of zswap as a top swap tier.

Regarding zswap's position, I agree it needs to be defined as the default,
top-most tier in swap_tier. In my early RFC, I allocated a separate tier
specifically for zswap:
(https://lore.kernel.org/linux-mm/20251109124947.1101520-3-youngjun.park@xxxxxxx/)

> > Same for me though I imagine swap tiers would introduce some duplication i.e.
> > different way (interface) to set limits for swap tiers for a given memcg.
> >

I also agree with the concern about interface duplication. We will eventually
need a mechanism to control swap amounts per tier, which requires thinking
about its relationship with swap.max. (I raised this as an open question in
my early RFC).
https://lore.kernel.org/linux-mm/20251109124947.1101520-1-youngjun.park@xxxxxxx/
(Further Discussion and Open Questions Part)
However, since this feature is necessary anyway, wouldn't the proposed
interface be acceptable without causing conflicts at this early stage?

Additionally, `memory.zswap.writeback` seems redundant. Restricting a cgroup
to only use the zswap tier (assuming it's the first tier) is practically
identical to disabling `memory.zswap.writeback` (correct me if I'm wrong).
But there is no problem to integrate it as I think
e.g `memory.zswap.writeback` could internally act as an alias for setting `memory.swap.tier` to 'zswap only'.

BR,
Youngjun Park