Re: [RFC PATCH 0/2] zswap: fix placement inversion in memory tiering systems
From: Nhat Pham
Date: Sat Mar 29 2025 - 18:13:41 EST
On Sat, Mar 29, 2025 at 12:53 PM Yosry Ahmed <yosry.ahmed@xxxxxxxxx> wrote:
>
> March 29, 2025 at 1:02 PM, "Nhat Pham" <nphamcs@xxxxxxxxx> wrote:
>
> > Currently, systems with CXL-based memory tiering can encounter the
> > following inversion with zswap: the coldest pages demoted to the CXL
> > tier can return to the high tier when they are zswapped out,
> > creating memory pressure on the high tier.
> > This happens because zsmalloc, zswap's backend memory allocator, does
> > not enforce any memory policy. If the task reclaiming memory follows
> > the local-first policy for example, the memory requested for zswap can
> > be served by the upper tier, leading to the aformentioned inversion.
> > This RFC fixes this inversion by adding a new memory allocation mode
> > for zswap (exposed through a zswap sysfs knob), intended for
> > hosts with CXL, where the memory for the compressed object is requested
> > preferentially from the same node that the original page resides on.
>
> I didn't look too closely, but why not just prefer the same node by default? Why is a knob needed?
Good question, yeah the knob is to maintain the old behavior :) It
might not be optimal, or even advisable, for all set up.
For hosts with node-based memory tiering, then yeah it's a good idea
in general, but I don't quite know how to have information about that
from the kernel's perspective.
>
> Or maybe if there's a way to tell the "tier" of the node we can prefer to allocate from the same "tier"?
Is there an abstraction of the "tier" that we can use here?