[PATCH] mm/page_alloc.c: Avoid infinite retries caused by cpuset race

From: Tianyang Zhang
Date: Wed Apr 16 2025 - 04:27:04 EST


__alloc_pages_slowpath has no change detection for ac->nodemask
in the part of retry path, while cpuset can modify it in parallel.
For some processes that set mempolicy as MPOL_BIND, this results
ac->nodemask changes, and then the should_reclaim_retry will
judge based on the latest nodemask and jump to retry, while the
get_page_from_freelist only traverses the zonelist from
ac->preferred_zoneref, which selected by a expired nodemask
and may cause infinite retries in some cases

cpu 64:
__alloc_pages_slowpath {
/* ..... */
retry:
/* ac->nodemask = 0x1, ac->preferred->zone->nid = 1 */
if (alloc_flags & ALLOC_KSWAPD)
wake_all_kswapds(order, gfp_mask, ac);
/* cpu 1:
cpuset_write_resmask
update_nodemask
update_nodemasks_hier
update_tasks_nodemask
mpol_rebind_task
mpol_rebind_policy
mpol_rebind_nodemask
// mempolicy->nodes has been modified,
// which ac->nodemask point to

*/
/* ac->nodemask = 0x3, ac->preferred->zone->nid = 1 */
if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags,
did_some_progress > 0, &no_progress_loops))
goto retry;
}

Simultaneously starting multiple cpuset01 from LTP can quickly
reproduce this issue on a multi node server when the maximum
memory pressure is reached and the swap is enabled

Signed-off-by: Tianyang Zhang <zhangtianyang@xxxxxxxxxxx>
---
mm/page_alloc.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index fd6b865cb1ab..1e82f5214a42 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4530,6 +4530,14 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
}

retry:
+ /*
+ * Deal with possible cpuset update races or zonelist updates to avoid
+ * infinite retries.
+ */
+ if (check_retry_cpuset(cpuset_mems_cookie, ac) ||
+ check_retry_zonelist(zonelist_iter_cookie))
+ goto restart;
+
/* Ensure kswapd doesn't accidentally go to sleep as long as we loop */
if (alloc_flags & ALLOC_KSWAPD)
wake_all_kswapds(order, gfp_mask, ac);
--
2.20.1