Re: [PATCH 2/2] mm/percpu: Avoid pcpu_alloc_mutex recursion from reclaim

From: Pedro Falcato

Date: Fri May 29 2026 - 05:38:55 EST

On Thu, May 28, 2026 at 09:29:17PM +0800, Kaitao Cheng wrote:
> From: Kaitao Cheng <chengkaitao@xxxxxxxxxx>
>
> pcpu_alloc_noprof() takes pcpu_alloc_mutex for sleepable allocations
> so that it can create chunks and populate backing pages. If reclaim is
> entered while that mutex is already held, and reclaim reaches a path
> which allocates percpu memory, the nested allocation can try to take
> pcpu_alloc_mutex again.
>
> That creates a reclaim recursion dependency:
>
> pcpu_alloc_noprof(GFP_KERNEL)
> mutex_lock(&pcpu_alloc_mutex)
> reclaim
> pcpu_alloc_noprof(GFP_NOIO/GFP_NOFS)
> mutex_lock(&pcpu_alloc_mutex)
>
> Avoid this by treating percpu allocations from reclaim context as atomic.
> Such allocations may still be served from already available and populated
> areas, but they must not enter the mutex-protected slow path or create new
> chunks. If no space is available, fail the allocation and let the normal
> balance work handle replenishment outside reclaim.
>
> Update the function comment to describe that reclaim context allocations
> are atomic regardless of whether the supplied GFP mask would otherwise
> allow blocking.
>
> This patch is a preventive fix. There may not currently be any path that
> calls pcpu_alloc_noprof(GFP_NOIO/GFP_NOFS) from direct reclaim context.

I don't like this. The proper way of fixing this would probably be to release
pcpu_alloc_mutex (or not have it in the first place!) while you're allocating
memory.

>
> Fixes: 9a5b183941b5 ("mm, percpu: do not consider sleepable allocations atomic")
> Signed-off-by: Kaitao Cheng <chengkaitao@xxxxxxxxxx>
> ---
> mm/percpu.c | 13 +++++++++----
> 1 file changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/mm/percpu.c b/mm/percpu.c
> index 1bb38467390b..9c30e5897813 100644
> --- a/mm/percpu.c
> +++ b/mm/percpu.c
> @@ -1803,9 +1803,9 @@ static void pcpu_memalloc_scope_restore(gfp_t gfp, unsigned int flags)
> * @gfp: allocation flags
> *
> * Allocate percpu area of @size bytes aligned at @align. If @gfp doesn't
> - * contain %GFP_KERNEL, the allocation is atomic. If @gfp has __GFP_NOWARN
> - * then no warning will be triggered on invalid or failed allocation
> - * requests.
> + * allow blocking, or if allocation is requested from reclaim context, the
> + * allocation is atomic. If @gfp has __GFP_NOWARN then no warning will be
> + * triggered on invalid or failed allocation requests.
> *
> * RETURNS:
> * Percpu pointer to the allocated area on success, NULL on failure.
> @@ -1828,7 +1828,12 @@ void __percpu *pcpu_alloc_noprof(size_t size, size_t align, bool reserved,
> gfp = current_gfp_context(gfp);
> /* whitelisted flags that can be passed to the backing allocators */
> pcpu_gfp = gfp & (GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN);
> - is_atomic = !gfpflags_allow_blocking(gfp);
> + /*
> + * Reclaim can be entered while pcpu_alloc_mutex is already held by
> + * another percpu allocation. Avoid recursing back into the mutex from
> + * reclaim; best-effort allocations from already populated areas are OK.
> + */

since this is an entirely theoretical issue:

/* Reclaim paths should not be hitting the percpu allocator, for now */
if (WARN_ON_ONCE(current->reclaim_state))
return NULL;

But that's just my 2c.

> + is_atomic = !gfpflags_allow_blocking(gfp) || current->reclaim_state;
> do_warn = !(gfp & __GFP_NOWARN);
>
> /*
> --
> 2.50.1 (Apple Git-155)
>

--
Pedro