Re: [PATCH v7 4/4] mm: swap: filter swap allocation by memcg tier mask

Next message: Greg Kroah-Hartman: "[PATCH 5.15 267/776] drm/amdgpu: fix zero-size GDS range init on RDNA4"
Previous message: Manish Baing: "[PATCH] mctp: serial: replace memset with zero-initialization"
In reply to: Kairui Song: "Re: [PATCH v7 4/4] mm: swap: filter swap allocation by memcg tier mask"
Next in thread: Nhat Pham: "Re: [PATCH v7 4/4] mm: swap: filter swap allocation by memcg tier mask"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: Nhat Pham

Date: Sat May 30 2026 - 14:00:21 EST

On Tue, May 26, 2026 at 11:23 PM Youngjun Park <youngjun.park@xxxxxxx> wrote:
>
> Apply memcg tier effective mask during swap slot allocation to
> enforce per-cgroup swap tier restrictions.
>
> In the fast path, check the percpu cached swap_info's tier_mask
> against the folio's effective mask. If it does not match, fall
> through to the slow path. In the slow path, skip swap devices
> whose tier_mask is not covered by the folio's effective mask.
>
> This works correctly when there is only one non-rotational
> device in the system and no devices share the same priority.
> However, there are known limitations:
>
> - When non-rotational devices are distributed across multiple
> tiers, and different memcgs are configured to use those
> distinct tiers, they may constantly overwrite the shared
> percpu swap cache. This cache thrashing leads to frequent
> fast path misses.
>
> - Combined with the above issue, if same-priority devices exist
> among them, a percpu cache miss (overwritten by another memcg)
> forces the allocator to round-robin to the next device
> prematurely, even if the current cluster is not fully
> exhausted.

I had very similar issues when I tried hacking vswap on top of swap
table too... It's even worse over there because it's not just
performance - vswap needs special handling in certain cases, and in
some places cannot be used at all (for e.g in zswap writeback). I
ended up having to add separate caching for vswap device:

https://lore.kernel.org/all/20260528212955.1912856-1-nphamcs@xxxxxxxxx/

How expensive is it to add per-cpu caching for each device :(

Anyway, as a first step, this LGTM. Reviewing from swap's mechanism
perspective, and leaving the cgroup side to memcg folks:

Reviewed-by: Nhat Pham <nphamcs@xxxxxxxxx>