Re: [PATCH v3 1/4] mm/zswap: Make shrink_worker writeback cursor per-memcg

From: Hao Jia

Date: Sun Jun 14 2026 - 22:46:12 EST

On 2026/6/13 02:15, Yosry Ahmed wrote:

On Fri, Jun 12, 2026 at 9:40 AM Shakeel Butt <shakeel.butt@xxxxxxxxx> wrote:

On Thu, Jun 11, 2026 at 05:39:16PM +0000, Yosry Ahmed wrote:

On Tue, Jun 09, 2026 at 11:18:26AM +0800, Hao Jia wrote:

On 2026/6/9 02:01, Nhat Pham wrote:

On Mon, Jun 8, 2026 at 9:48 AM Yosry Ahmed <yosry@xxxxxxxxxx> wrote:

But OTOH, this does seem like a recipe for inefficient reclaim. We
might exhaust hotter memory of a cgroup while sparing colder memory of
another cgroup... But maybe if they're all cold anyway, then who
cares, and eventually you'll get to the cold stuff of other child?

Forgot to respond to this part, the unfairness is limited to the batch
size per-invocation, so it should be fine as long as you don't divide
the amount over 100 iterations for some reason. Also yes, all memory
in zswap is cold, the relative coldness is not that important (e.g.
compared to relative coldness during reclaim).

Ok then yeah, I think we should shelve per-memcg cursor for the next
version. Down the line, if we have more data that unfairness is an
issue, we can always fix it. One step at a time :)

Thanks a lot to Yosry, Nhat, and Shakeel for the great suggestions!

Let me summarize what I plan to do in the next version to make sure we are
on the same page:

- Drop the per-memcg cursor and keep the root cgroup cursor
(zswap_next_shrink) logic intact.
- Stick to using the zswap_writeback_only key, and change the proactive
writeback size to use the compressed size.
- Consolidate and reuse the logic between shrink_worker() and
shrink_memcg(). Enable batch writeback in the shrink_worker() path, while
keeping the writeback behavior in the zswap_store() path unchanged.

Please let me know if I missed or misunderstood anything. Thanks again for
clearing things up!

Sorry for the late response, yes I think this makes sense. However, I
have some comment about how this interacts with swap tiering, let me
reply to the other thread.

I think the swap tiers interaction will be figured out over next cycle. However
Hao can/should continue to push and we may decide to let it in orthogonal to
swap tiers.

Yeah I think there are a lot of changes we discussed outside of the
memcg interface, so maybe keep the interface as-is for now, work on a
new version with the other changes, and we can finalize the interface
at the end?

Okay, I will split the non-memcg interface parts into a few separate patches. These will serve as the preparation work for proactive writeback and enable batch writeback in the shrink_worker() path.

However, I will still send the complete patchset using the zswap_writeback_only key approach in the next version. This should make it easier to review whether the preparation logic is reasonable, and to decide whether it should eventually be merged independently of the swap tiers.

Thanks,
Hao