[PATCH 0/8] mm/zswap, zsmalloc: Per-memcg-lruvec zswap accounting

From: Joshua Hahn

Date: Thu Feb 26 2026 - 14:32:30 EST


INTRODUCTION
============
The current design for zswap and zsmalloc leaves a clean divide between
layers of the memory stack. At the higher level, we have zswap, which
interacts directly with memory consumers, compression algorithms, and
handles memory usage accounting via memcg limits. At the lower level,
we have zsmalloc, which handles the page allocation and migration of
physical pages.

While this logical separation simplifies the codebase, it leaves
problems for accounting that requires both memory cgroup awareness and
physical memory location. To name a few:

- On tiered systems, it is impossible to understand how much toptier
memory a cgroup is using, since zswap has no understanding of where
the compressed memory is physically stored.
+ With SeongJae Park's work to store incompressible pages as-is in
zswap [1], the size of compressed memory can become non-trivial,
and easily consume a meaningful portion of memory.

- cgroups that restrict memory nodes have no control over which nodes
their zswapped objects live on. This can lead to unexpectedly high
fault times for workloads, who must eat the remote access latency
cost of retrieving the compressed object from a remote node.
+ Nhat Pham addressed this issue via a best-effort attempt to place
compressed objects in the same page as the original page, but this
cannot guarantee complete isolation [2].

- On the flip side, zsmalloc's ignorance of cgroup also makes its
shrinker memcg-unaware, which can lead to ineffective reclaim when
pressure is localized to a single cgroup.

Until recently, zpool acted as another layer of indirection between
zswap and zsmalloc, which made bridging memcg and physical location
difficult. Now that zsmalloc is the only allocator backend for zswap and
zram [3], it is possible to move memory-cgroup accounting to the
zsmalloc layer.

Introduce a new per-zpdesc array of objcg pointers to track
per-memcg-lruvec memory usage by zswap, while leaving zram users
unaffected.

This creates one source of truth for NR_ZSWAP, and more accurate
accounting for NR_ZSWAPPED.

This brings sizeof(struct zpdesc) from 56 bytes to 64 bytes, but this
increase in size is unseen by the rest of the system because zpdesc
overlays struct page. Implementation details and care taken to handle
the page->memcg_data field can be found in patch 3.

In addition, move the accounting of memcg charges to the zsmalloc layer,
whose only user is zswap at the moment.

PATCH OUTLINE
=============
Patches 1 and 2 are small cleanups that make the codebase consistent and
easier to digest.

Patches 3, 4, and 5 allocate and populate the new zpdesc->objcgs field
with compressed objects' obj_cgroups. zswap_entry->objcgs is removed,
and redirected to look at the zspage for memcg information.

Patch 6 moves the charging and lifetime management of obj_cgroups to
the zsmalloc layer, which leaves zswap only as a plumbing layer to hand
cgroup information to zsmalloc.

Patches 7 and 8 introduce node counters and memcg-lruvec counters for
zswap. Special care is taken for compressed objects that span multiple
nodes.

[1] https://lore.kernel.org/linux-mm/20250822190817.49287-1-sj@xxxxxxxxxx/
[2] https://lore.kernel.org/linux-mm/20250402204416.3435994-1-nphamcs@xxxxxxxxx/#t3
[3] https://lore.kernel.org/linux-mm/20250829162212.208258-1-hannes@xxxxxxxxxxx/
[4] https://lore.kernel.org/linux-mm/c8bc2dce-d4ec-c16e-8df4-2624c48cfc06@xxxxxxxxxx/

Joshua Hahn (8):
mm/zsmalloc: Rename zs_object_copy to zs_obj_copy
mm/zsmalloc: Make all obj_idx unsigned ints
mm/zsmalloc: Introduce objcgs pointer in struct zpdesc
mm/zsmalloc: Store obj_cgroup pointer in zpdesc
mm/zsmalloc,zswap: Redirect zswap_entry->obcg to zpdesc
mm/zsmalloc, zswap: Handle objcg charging and lifetime in zsmalloc
mm/memcontrol: Track MEMCG_ZSWAPPED in bytes
mm/vmstat, memcontrol: Track ZSWAP_B, ZSWAPPED_B per-memcg-lruvec

drivers/block/zram/zram_drv.c | 17 +-
include/linux/memcontrol.h | 15 +-
include/linux/mmzone.h | 2 +
include/linux/zsmalloc.h | 6 +-
mm/memcontrol.c | 68 ++------
mm/vmstat.c | 2 +
mm/zpdesc.h | 25 ++-
mm/zsmalloc.c | 282 ++++++++++++++++++++++++++++++++--
mm/zswap.c | 67 ++++----
9 files changed, 345 insertions(+), 139 deletions(-)

--
2.47.3