Re: [PATCH v3 1/4] mm/zswap: Make shrink_worker writeback cursor per-memcg
From: Nhat Pham
Date: Thu Jun 04 2026 - 13:37:49 EST
On Thu, Jun 4, 2026 at 6:06 AM Hao Jia <jiahao.kernel@xxxxxxxxx> wrote:
>
>
>
> On 2026/6/4 13:34, Yosry Ahmed wrote:
> >>>> For instance, suppose a parent memcg has two children, memcg1 and memcg2,
> >>>> each with 200MB of zswap (100MB inactive). Triggering proactive writeback on
> >>>> the parent memcg will exhaust memcg1's inactive zswap pages. After that,
> >>>> even though memcg2 still has plenty of inactive zswap pages, it will
> >>>> continue to write back memcg1's active zswap pages. Writing back active
> >>>> zswap pages causes the user-space agent to prematurely abort the writeback
> >>>> because it detects that certain memcg metrics have exceeded predefined
> >>>> thresholds.
> >>>
> >>> This will only happen if the reclaim size is smaller than the batch
> >>> size, right? Otherwise the kernel should reclaim more or less equally
> >>> from both memcgs?
> >>>
> >>
> >> I gave it some thought. Not using a cursor could lead to unfairness
> >> issues with certain writeback sizes:
> >>
> >> - If the writeback size is an odd multiple of WB_BATCH (e.g.,
> >> triggering a writeback of 3 * WB_BATCH), with 2 child cgroups, the
> >> writeback ratio might end up being 2:1.
> >> - If a memcg has 5 child cgroups and a writeback of 2 * WB_BATCH is
> >> triggered, it might repeatedly write back from only the first 2 child
> >> cgroups.
> >>
> >> Although setting a smaller WB_BATCH might mitigate this unfairness, it
> >> could hurt writeback efficiency. Let's just use per-memcg cursors to
> >> completely fix these corner cases.
> >
> > Exactly, the batch size should be small enough that any unfairness is
> > not a problem. I would honestly just do batching without a per-memcg
> > cursor, unless we have numbers to prove that the efficiency is
> > affected when we use a small batch size. Let's only introduce
> > complexity when needed please.
I'm impartial towards the complexity of per-memcg cursor. I don't
think it's that big of a deal, but only if it's warranted.
Hao, if you're convinced that doing small batch is not efficient,
could you run some experiments to show the improvement bigger batchign
and fairness? Maybe implement a small batch, no-memcg cursor first.
Then implement a patch on top of it to add per-memcg cursor, and show
how much performance win we can get from that patch on top of the
patch series?
FWIW, zswap writeback right now is not that batch-efficient :) There
is no IO batching, or batched lock operations (we drop the lock
whenever we attempt to writeback a page), etc. Might be a good avenue
to optimize.