[PATCH 2/2] mm/percpu: Avoid pcpu_alloc_mutex recursion from reclaim

From: Kaitao Cheng

Date: Thu May 28 2026 - 09:33:39 EST

From: Kaitao Cheng <chengkaitao@xxxxxxxxxx>

pcpu_alloc_noprof() takes pcpu_alloc_mutex for sleepable allocations
so that it can create chunks and populate backing pages. If reclaim is
entered while that mutex is already held, and reclaim reaches a path
which allocates percpu memory, the nested allocation can try to take
pcpu_alloc_mutex again.

That creates a reclaim recursion dependency:

pcpu_alloc_noprof(GFP_KERNEL)
mutex_lock(&pcpu_alloc_mutex)
reclaim
pcpu_alloc_noprof(GFP_NOIO/GFP_NOFS)
mutex_lock(&pcpu_alloc_mutex)

Avoid this by treating percpu allocations from reclaim context as atomic.
Such allocations may still be served from already available and populated
areas, but they must not enter the mutex-protected slow path or create new
chunks. If no space is available, fail the allocation and let the normal
balance work handle replenishment outside reclaim.

Update the function comment to describe that reclaim context allocations
are atomic regardless of whether the supplied GFP mask would otherwise
allow blocking.

This patch is a preventive fix. There may not currently be any path that
calls pcpu_alloc_noprof(GFP_NOIO/GFP_NOFS) from direct reclaim context.

Fixes: 9a5b183941b5 ("mm, percpu: do not consider sleepable allocations atomic")
Signed-off-by: Kaitao Cheng <chengkaitao@xxxxxxxxxx>
---
mm/percpu.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/mm/percpu.c b/mm/percpu.c
index 1bb38467390b..9c30e5897813 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -1803,9 +1803,9 @@ static void pcpu_memalloc_scope_restore(gfp_t gfp, unsigned int flags)
* @gfp: allocation flags
*
* Allocate percpu area of @size bytes aligned at @align. If @gfp doesn't
- * contain %GFP_KERNEL, the allocation is atomic. If @gfp has __GFP_NOWARN
- * then no warning will be triggered on invalid or failed allocation
- * requests.
+ * allow blocking, or if allocation is requested from reclaim context, the
+ * allocation is atomic. If @gfp has __GFP_NOWARN then no warning will be
+ * triggered on invalid or failed allocation requests.
*
* RETURNS:
* Percpu pointer to the allocated area on success, NULL on failure.
@@ -1828,7 +1828,12 @@ void __percpu *pcpu_alloc_noprof(size_t size, size_t align, bool reserved,
gfp = current_gfp_context(gfp);
/* whitelisted flags that can be passed to the backing allocators */
pcpu_gfp = gfp & (GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN);
- is_atomic = !gfpflags_allow_blocking(gfp);
+ /*
+ * Reclaim can be entered while pcpu_alloc_mutex is already held by
+ * another percpu allocation. Avoid recursing back into the mutex from
+ * reclaim; best-effort allocations from already populated areas are OK.
+ */
+ is_atomic = !gfpflags_allow_blocking(gfp) || current->reclaim_state;
do_warn = !(gfp & __GFP_NOWARN);

/*
--
2.50.1 (Apple Git-155)