Re: [PATCH v4 1/5] mm/zswap: Extend shrink_memcg() writeback capability
From: Yosry Ahmed
Date: Tue Jun 23 2026 - 14:18:22 EST
> My initial thought was that if cold memory is evenly distributed across
> nodes and we are doing a large writeback, it would be better to balance
> the zswap entry writeback across all nodes rather than just draining
> node 0 first. However, since we currently lack a proper metric to
> represent hot/cold memory (such as age-based tracking), doing this
> probably doesn't make much sense right now.
Yeah let's start simple and go from there.
>
> So, perhaps we want something like this? Please correct me if I'm wrong.
>
> static long shrink_memcg(struct mem_cgroup *memcg,
> unsigned long nr_to_scan)
> {
> struct zswap_shrink_walk_arg walk_arg = {
> .bytes_written = 0,
> .encountered_page_in_swapcache = false,
> };
> unsigned long nr_remaining = nr_to_scan;
> bool memcg_list_is_empty = true;
> int nid;
>
> if (!mem_cgroup_zswap_writeback_enabled(memcg))
> return -ENOENT;
>
> if (memcg && !mem_cgroup_online(memcg))
> return -ENOENT;
>
> for_each_node_state(nid, N_NORMAL_MEMORY) {
> unsigned long nr_to_walk;
>
> /*
> * Cap the per-node scan by the current LRU length. A referenced
> * entry is only rotated to the tail (second chance) and may be
> * revisited within a single walk; without this cap those rotated
> * entries could drain the shared scan budget on one node.
> */
The comment here is a bit misleading. It's not just about draining one
node. One call to shrink_memcg() should only scan entries once. The
caller can then choose to scan the memcg again, or scan a different
one. In this case, the caller should iterate all memcgs first before
retrying memcgs again and reclaiming rotated entries.
> nr_to_walk = min(nr_remaining,
> list_lru_count_one(&zswap_list_lru, nid, memcg));
> if (!nr_to_walk)
> continue;
> memcg_list_is_empty = false;
>
> nr_remaining -= nr_to_walk;
> list_lru_walk_one(&zswap_list_lru, nid, memcg,
> &shrink_memcg_cb, &walk_arg, &nr_to_walk);
> /* Return the unused share of the budget to the pool. */
> nr_remaining += nr_to_walk;
>
> /* Bail out once the whole scan budget has been spent. */
The comment is unnecessary.
> if (!nr_remaining)
> break;
>
> cond_resched();
Did you observe a problem here or did you just add this due to an
abundance of caution?
> }
>
> if (memcg_list_is_empty)
Do we need memcg_list_is_empty? Can we just check if nr_remaining
matches nr_to_scan?
> return -ENOENT;
>
> return walk_arg.bytes_written;
> }
>
> Thanks,
> Hao