[PATCH v6 4/4] mm: swap: filter swap allocation by memcg tier mask
From: Youngjun Park
Date: Tue Apr 21 2026 - 01:53:54 EST
Apply memcg tier effective mask during swap slot allocation to
enforce per-cgroup swap tier restrictions.
In the fast path, check the percpu cached swap_info's tier_mask
against the folio's effective mask. If it does not match, fall
through to the slow path. In the slow path, skip swap devices
whose tier_mask is not covered by the folio's effective mask.
This works correctly when there is only one non-rotational
device in the system and no devices share the same priority.
However, there are known limitations:
- When non-rotational devices are distributed across multiple
tiers, and different memcgs are configured to use those
distinct tiers, they may constantly overwrite the shared
percpu swap cache. This cache thrashing leads to frequent
fast path misses.
- Combined with the above issue, if same-priority devices exist
among them, a percpu cache miss (overwritten by another memcg)
forces the allocator to round-robin to the next device
prematurely, even if the current cluster is not fully
exhausted.
These edge cases do not affect the primary use case of
directing swap traffic per cgroup. Further optimization is
planned for future work.
Signed-off-by: Youngjun Park <youngjun.park@xxxxxxx>
---
mm/swapfile.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/mm/swapfile.c b/mm/swapfile.c
index d5abc831cde7..8734e5d26b08 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1352,15 +1352,22 @@ static bool swap_alloc_fast(struct folio *folio)
struct swap_cluster_info *ci;
struct swap_info_struct *si;
unsigned int offset;
+ int mask = folio_tier_effective_mask(folio);
/*
* Once allocated, swap_info_struct will never be completely freed,
* so checking it's liveness by get_swap_device_info is enough.
*/
si = this_cpu_read(percpu_swap_cluster.si[order]);
+ if (!si || !swap_tiers_mask_test(si->tier_mask, mask) ||
+ !get_swap_device_info(si))
+ return false;
+
offset = this_cpu_read(percpu_swap_cluster.offset[order]);
- if (!si || !offset || !get_swap_device_info(si))
+ if (!offset) {
+ put_swap_device(si);
return false;
+ }
ci = swap_cluster_lock(si, offset);
if (cluster_is_usable(ci, order)) {
@@ -1379,10 +1386,14 @@ static bool swap_alloc_fast(struct folio *folio)
static void swap_alloc_slow(struct folio *folio)
{
struct swap_info_struct *si, *next;
+ int mask = folio_tier_effective_mask(folio);
spin_lock(&swap_avail_lock);
start_over:
plist_for_each_entry_safe(si, next, &swap_avail_head, avail_list) {
+ if (!swap_tiers_mask_test(si->tier_mask, mask))
+ continue;
+
/* Rotate the device and switch to a new cluster */
plist_requeue(&si->avail_list, &swap_avail_head);
spin_unlock(&swap_avail_lock);
--
2.34.1