Re: [PATCH v3 1/4] mm/zswap: Make shrink_worker writeback cursor per-memcg
From: Hao Jia
Date: Thu Jun 04 2026 - 09:19:07 EST
On 2026/6/4 13:34, Yosry Ahmed wrote:
For instance, suppose a parent memcg has two children, memcg1 and memcg2,
each with 200MB of zswap (100MB inactive). Triggering proactive writeback on
the parent memcg will exhaust memcg1's inactive zswap pages. After that,
even though memcg2 still has plenty of inactive zswap pages, it will
continue to write back memcg1's active zswap pages. Writing back active
zswap pages causes the user-space agent to prematurely abort the writeback
because it detects that certain memcg metrics have exceeded predefined
thresholds.
This will only happen if the reclaim size is smaller than the batch
size, right? Otherwise the kernel should reclaim more or less equally
from both memcgs?
I gave it some thought. Not using a cursor could lead to unfairness
issues with certain writeback sizes:
- If the writeback size is an odd multiple of WB_BATCH (e.g.,
triggering a writeback of 3 * WB_BATCH), with 2 child cgroups, the
writeback ratio might end up being 2:1.
- If a memcg has 5 child cgroups and a writeback of 2 * WB_BATCH is
triggered, it might repeatedly write back from only the first 2 child
cgroups.
Although setting a smaller WB_BATCH might mitigate this unfairness, it
could hurt writeback efficiency. Let's just use per-memcg cursors to
completely fix these corner cases.
Exactly, the batch size should be small enough that any unfairness is
not a problem. I would honestly just do batching without a per-memcg
cursor, unless we have numbers to prove that the efficiency is
affected when we use a small batch size. Let's only introduce
complexity when needed please.
If you prefer not to use per-cgroup cursors, do we still need to keep the global cursor (i.e., the root cgroup's cursor) zswap_next_shrink?
I found this part to be quite tricky when trying to reuse the main logic of shrink_worker() in zswap_proactive_writeback().
Of course, I think we could also keep zswap_next_shrink and write a small helper to check if it's the root cgroup, allowing us to use different memcg iteration methods.
Thanks,
Hao