[PATCH v2 9/9] mm: zswap: per-node kmem accounting for zswap/zsmalloc

From: Alexandre Ghiti

Date: Fri Jun 26 2026 - 06:34:28 EST


Update zswap and zsmalloc to use per-node obj_cgroup for kmem
accounting, attributing compressed page charges to the correct
NUMA node.

But actually, this is incomplete because it does not correctly account
for entries that straddle pages, those pages being possibly on 2 different
nodes.

This will be correctly handled by Joshua in a different series [1].

Link: https://lore.kernel.org/linux-mm/20260311195153.4013476-1-joshua.hahnjy@xxxxxxxxx/ [1]
Signed-off-by: Alexandre Ghiti <alex@xxxxxxxx>
---
include/linux/zsmalloc.h | 2 ++
mm/zsmalloc.c | 11 +++++++++++
mm/zswap.c | 19 ++++++++++++++++++-
3 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/include/linux/zsmalloc.h b/include/linux/zsmalloc.h
index 478410c880b1..30427f3fe232 100644
--- a/include/linux/zsmalloc.h
+++ b/include/linux/zsmalloc.h
@@ -50,6 +50,8 @@ void zs_obj_read_sg_end(struct zs_pool *pool, unsigned long handle);
void zs_obj_write(struct zs_pool *pool, unsigned long handle,
void *handle_mem, size_t mem_len);

+int zs_handle_to_nid(struct zs_pool *pool, unsigned long handle);
+
extern const struct movable_operations zsmalloc_mops;

#endif
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 83f5820c45f9..17f7403ebe77 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1380,6 +1380,17 @@ static void obj_free(int class_size, unsigned long obj)
mod_zspage_inuse(zspage, -1);
}

+int zs_handle_to_nid(struct zs_pool *pool, unsigned long handle)
+{
+ unsigned long obj;
+ struct zpdesc *zpdesc;
+
+ obj = handle_to_obj(handle);
+ obj_to_zpdesc(obj, &zpdesc);
+ return page_to_nid(zpdesc_page(zpdesc));
+}
+EXPORT_SYMBOL(zs_handle_to_nid);
+
void zs_free(struct zs_pool *pool, unsigned long handle)
{
struct zspage *zspage;
diff --git a/mm/zswap.c b/mm/zswap.c
index 761cd699e0a3..466c6a3f4ef3 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -1438,7 +1438,24 @@ static bool zswap_store_page(struct page *page,
*/
zswap_pool_get(pool);
if (objcg) {
- obj_cgroup_get(objcg);
+ struct obj_cgroup *nid_objcg;
+ int nid = zs_handle_to_nid(pool->zs_pool, entry->handle);
+
+ /*
+ * obj_cgroup_nid() returns a borrowed RCU pointer (no
+ * reference), so the returned per-node objcg may be freed
+ * (kfree_rcu) before we use it. Pin it with a tryget inside a
+ * single rcu section; if it is already dying, fall back to the
+ * folio objcg (held by the caller) so the charge still lands on
+ * the right memcg, just without per-node attribution.
+ */
+ rcu_read_lock();
+ nid_objcg = obj_cgroup_nid(objcg, nid);
+ if (nid_objcg && obj_cgroup_tryget(nid_objcg))
+ objcg = nid_objcg;
+ else
+ obj_cgroup_get(objcg);
+ rcu_read_unlock();
obj_cgroup_charge_zswap(objcg, entry->length);
}
atomic_long_inc(&zswap_stored_pages);
--
2.54.0