Re: [PATCH v3 1/4] mm/zswap: Make shrink_worker writeback cursor per-memcg
From: Yosry Ahmed
Date: Thu Jun 04 2026 - 01:41:06 EST
> >> For instance, suppose a parent memcg has two children, memcg1 and memcg2,
> >> each with 200MB of zswap (100MB inactive). Triggering proactive writeback on
> >> the parent memcg will exhaust memcg1's inactive zswap pages. After that,
> >> even though memcg2 still has plenty of inactive zswap pages, it will
> >> continue to write back memcg1's active zswap pages. Writing back active
> >> zswap pages causes the user-space agent to prematurely abort the writeback
> >> because it detects that certain memcg metrics have exceeded predefined
> >> thresholds.
> >
> > This will only happen if the reclaim size is smaller than the batch
> > size, right? Otherwise the kernel should reclaim more or less equally
> > from both memcgs?
> >
>
> I gave it some thought. Not using a cursor could lead to unfairness
> issues with certain writeback sizes:
>
> - If the writeback size is an odd multiple of WB_BATCH (e.g.,
> triggering a writeback of 3 * WB_BATCH), with 2 child cgroups, the
> writeback ratio might end up being 2:1.
> - If a memcg has 5 child cgroups and a writeback of 2 * WB_BATCH is
> triggered, it might repeatedly write back from only the first 2 child
> cgroups.
>
> Although setting a smaller WB_BATCH might mitigate this unfairness, it
> could hurt writeback efficiency. Let's just use per-memcg cursors to
> completely fix these corner cases.
Exactly, the batch size should be small enough that any unfairness is
not a problem. I would honestly just do batching without a per-memcg
cursor, unless we have numbers to prove that the efficiency is
affected when we use a small batch size. Let's only introduce
complexity when needed please.