Re: [PATCH v6 4/4] mm: swap: filter swap allocation by memcg tier mask
From: YoungJun Park
Date: Tue May 26 2026 - 22:17:17 EST
On Wed, May 27, 2026 at 09:42:54AM +0800, Baoquan He wrote:
> On 04/21/26 at 02:53pm, Youngjun Park wrote:
> > Apply memcg tier effective mask during swap slot allocation to
> > enforce per-cgroup swap tier restrictions.
> >
> > In the fast path, check the percpu cached swap_info's tier_mask
> > against the folio's effective mask. If it does not match, fall
> > through to the slow path. In the slow path, skip swap devices
> > whose tier_mask is not covered by the folio's effective mask.
> >
> > This works correctly when there is only one non-rotational
> > device in the system and no devices share the same priority.
> > However, there are known limitations:
> >
> > - When non-rotational devices are distributed across multiple
> > tiers, and different memcgs are configured to use those
> > distinct tiers, they may constantly overwrite the shared
> > percpu swap cache. This cache thrashing leads to frequent
> > fast path misses.
> >
> > - Combined with the above issue, if same-priority devices exist
> > among them, a percpu cache miss (overwritten by another memcg)
> > forces the allocator to round-robin to the next device
> > prematurely, even if the current cluster is not fully
> > exhausted.
> >
> > These edge cases do not affect the primary use case of
> > directing swap traffic per cgroup. Further optimization is
> > planned for future work.
> >
> > Signed-off-by: Youngjun Park <youngjun.park@xxxxxxx>
> > ---
> > mm/swapfile.c | 13 ++++++++++++-
> > 1 file changed, 12 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/swapfile.c b/mm/swapfile.c
> > index d5abc831cde7..8734e5d26b08 100644
> > --- a/mm/swapfile.c
> > +++ b/mm/swapfile.c
> > @@ -1352,15 +1352,22 @@ static bool swap_alloc_fast(struct folio *folio)
> > struct swap_cluster_info *ci;
> > struct swap_info_struct *si;
> > unsigned int offset;
> > + int mask = folio_tier_effective_mask(folio);
> >
> > /*
> > * Once allocated, swap_info_struct will never be completely freed,
> > * so checking it's liveness by get_swap_device_info is enough.
> > */
> > si = this_cpu_read(percpu_swap_cluster.si[order]);
> > + if (!si || !swap_tiers_mask_test(si->tier_mask, mask) ||
> > + !get_swap_device_info(si))
> > + return false;
> > +
> > offset = this_cpu_read(percpu_swap_cluster.offset[order]);
> > - if (!si || !offset || !get_swap_device_info(si))
> > + if (!offset) {
> > + put_swap_device(si);
> > return false;
> > + }
>
> The whole patch looks good to me except of one nitpick. Is it a lille
> cleaner with below tiny adjustment?
>
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 2864cd8c2da9..cdf453bf6b80 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -1359,15 +1359,12 @@ static bool swap_alloc_fast(struct folio *folio)
> * so checking it's liveness by get_swap_device_info is enough.
> */
> si = this_cpu_read(percpu_swap_cluster.si[order]);
> - if (!si || !swap_tiers_mask_test(si->tier_mask, mask) ||
> - !get_swap_device_info(si))
> + if (!si || !swap_tiers_mask_test(si->tier_mask, mask))
> return false;
>
> offset = this_cpu_read(percpu_swap_cluster.offset[order]);
> - if (!offset) {
> - put_swap_device(si);
> + if (!offset || !get_swap_device_info(si))
> return false;
> - }
Thanks!
Your suggested version of code is simpler than privious one.
I will apply it.