Re: [PATCH v2 9/9] mm: zswap: per-node kmem accounting for zswap/zsmalloc
From: Nhat Pham
Date: Mon Jun 29 2026 - 14:09:19 EST
On Mon, Jun 29, 2026 at 6:38 AM Alexandre Ghiti <alex@xxxxxxxx> wrote:
>
> Hi Nhat,
>
> On 6/26/26 20:36, Nhat Pham wrote:
> > On Fri, Jun 26, 2026 at 7:32 AM Usama Arif <usama.arif@xxxxxxxxx> wrote:
> >> On Fri, 26 Jun 2026 12:20:58 +0200 Alexandre Ghiti <alex@xxxxxxxx> wrote:
> >>
> >>> Update zswap and zsmalloc to use per-node obj_cgroup for kmem
> >>> accounting, attributing compressed page charges to the correct
> >>> NUMA node.
> >>>
> >>> But actually, this is incomplete because it does not correctly account
> >>> for entries that straddle pages, those pages being possibly on 2 different
> >>> nodes.
> >>>
> >>> This will be correctly handled by Joshua in a different series [1].
> >>>
> >>> Link: https://lore.kernel.org/linux-mm/20260311195153.4013476-1-joshua.hahnjy@xxxxxxxxx/ [1]
> >>> Signed-off-by: Alexandre Ghiti <alex@xxxxxxxx>
> >>> ---
> >>> include/linux/zsmalloc.h | 2 ++
> >>> mm/zsmalloc.c | 11 +++++++++++
> >>> mm/zswap.c | 19 ++++++++++++++++++-
> >>> 3 files changed, 31 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/include/linux/zsmalloc.h b/include/linux/zsmalloc.h
> >>> index 478410c880b1..30427f3fe232 100644
> >>> --- a/include/linux/zsmalloc.h
> >>> +++ b/include/linux/zsmalloc.h
> >>> @@ -50,6 +50,8 @@ void zs_obj_read_sg_end(struct zs_pool *pool, unsigned long handle);
> >>> void zs_obj_write(struct zs_pool *pool, unsigned long handle,
> >>> void *handle_mem, size_t mem_len);
> >>>
> >>> +int zs_handle_to_nid(struct zs_pool *pool, unsigned long handle);
> >>> +
> >>> extern const struct movable_operations zsmalloc_mops;
> >>>
> >>> #endif
> >>> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> >>> index 83f5820c45f9..17f7403ebe77 100644
> >>> --- a/mm/zsmalloc.c
> >>> +++ b/mm/zsmalloc.c
> >>> @@ -1380,6 +1380,17 @@ static void obj_free(int class_size, unsigned long obj)
> >>> mod_zspage_inuse(zspage, -1);
> >>> }
> >>>
> >>> +int zs_handle_to_nid(struct zs_pool *pool, unsigned long handle)
> >>> +{
> >>> + unsigned long obj;
> >>> + struct zpdesc *zpdesc;
> >>> +
> >>> + obj = handle_to_obj(handle);
> >>> + obj_to_zpdesc(obj, &zpdesc);
> >>> + return page_to_nid(zpdesc_page(zpdesc));
> >>> +}
> >>> +EXPORT_SYMBOL(zs_handle_to_nid);
> >> Does this need the same locking as the other handle-to-zspage paths?
> >> zs_free() takes pool->lock before handle_to_obj() because zspage migration can
> >> update or move the object behind the handle. This helper does the same decode
> >> without the lock, so zswap's uncharge path can race migration and charge or
> >> uncharge the wrong node, or observe transient zspage state.
> >>
> > Can we just charge it to the page's node for now? Once Joshua's patch
> > series is in, we can correctly charge the node owning the data :)
>
>
> Even if this patch accounting is incorrect, it is close to reality,
> using the original page's node would give results that are really off no?
The current policy is same-node-first, i.e we prefer the same node as
the original page for the compressed data too:
https://github.com/torvalds/linux/commit/56e5a103a721d0ef139bba7ff3d3ada6c8217d5b
So the accuracy of this guesstimation depends on how much fallback we
had to do...
>
>
> >
> > FWIW, this is how these zswap entries are organized in the LRU too -
> > following to the OG page's node.
>
>
> Oh, we should do something about that right? Because the compressed data
> is not necessarily on the original page's node.
At initial placement? Yes, it's fixable.
But the bigger problem is on migration of a zsmalloc object. That
requires moving zswap entry from one lru list to another :) I'm not
sure what synchronization method is safe here - at minimum we need to
take the LRU locks, but what about incoming writeback, zswap load,
etc.?
>
> Thanks,
>
> Alex
>