Re: [PATCH 1/2] mm/percpu: Preserve NOFS/NOIO scope during chunk create and populate
From: Pedro Falcato
Date: Fri May 29 2026 - 05:27:33 EST
On Thu, May 28, 2026 at 09:29:16PM +0800, Kaitao Cheng wrote:
> From: Kaitao Cheng <chengkaitao@xxxxxxxxxx>
>
> pcpu_alloc_noprof() derives pcpu_gfp from the caller supplied GFP mask and
> passes it to the backing percpu allocators. This preserves GFP_NOFS and
> GFP_NOIO for pcpu_alloc_pages() and for the initial pcpu_chunk allocation.
>
> However, the chunk creation and population slow paths also call helpers
> which do not take a GFP mask and perform internal allocations with
> GFP_KERNEL. For example, pcpu_create_chunk() calls pcpu_get_vm_areas(),
> and population can allocate temporary metadata or page tables while mapping
> backing pages. As a result, a caller which explicitly uses GFP_NOFS or
> GFP_NOIO can still enter FS or IO reclaim while creating or populating a
> percpu chunk.
>
> This is problematic for callers which use GFP_NOFS or GFP_NOIO because
> they are already holding filesystem or IO-path locks. If free chunks are
> exhausted, the percpu allocation can take pcpu_alloc_mutex and then enter
> unconstrained reclaim from these internal allocations, defeating the
> caller's allocation context and potentially recreating reclaim lock
> dependencies.
>
> Wrap chunk creation and population in a scoped NOIO or NOFS context when
> pcpu_gfp has the corresponding constraints. Leave ordinary GFP_KERNEL
> allocations unchanged so they retain full reclaim capability.
>
> Fixes: 9a5b183941b5 ("mm, percpu: do not consider sleepable allocations atomic")
I assume you _did not_ observe this in production? As in no reclaim path should be
insane^W daring enough to do pcpu allocations?
> Signed-off-by: Kaitao Cheng <chengkaitao@xxxxxxxxxx>
> ---
> mm/percpu.c | 26 ++++++++++++++++++++++++++
> 1 file changed, 26 insertions(+)
>
> diff --git a/mm/percpu.c b/mm/percpu.c
> index 71a85d7245c7..1bb38467390b 100644
> --- a/mm/percpu.c
> +++ b/mm/percpu.c
> @@ -1778,6 +1778,23 @@ static void pcpu_alloc_tag_free_hook(struct pcpu_chunk *chunk, int off, size_t s
> }
> #endif
>
> +static unsigned int pcpu_memalloc_scope_save(gfp_t gfp)
> +{
> + if (!(gfp & __GFP_IO))
> + return memalloc_noio_save();
> + if (!(gfp & __GFP_FS))
> + return memalloc_nofs_save();
> + return 0;
> +}
> +
> +static void pcpu_memalloc_scope_restore(gfp_t gfp, unsigned int flags)
> +{
> + if (!(gfp & __GFP_IO))
> + memalloc_noio_restore(flags);
> + else if (!(gfp & __GFP_FS))
> + memalloc_nofs_restore(flags);
> +}
I disagree with this. We already have gfp flags, they're already passed to pcpu_create_chunk()
and pcpu_populate_chunk(). It's their job to respect the gfp flags and
Do The Right Thing(tm). Can you fix the problematic places? It seems like it's
mostly the vmalloc backend that's problematic.
> +
> /**
> * pcpu_alloc - the percpu allocator
> * @size: size of area to allocate in bytes
> @@ -1901,7 +1918,12 @@ void __percpu *pcpu_alloc_noprof(size_t size, size_t align, bool reserved,
>
> /* No space left. Create a new chunk. */
> if (list_empty(&pcpu_chunk_lists[pcpu_free_slot])) {
> + unsigned int pcpu_scope;
> +
> + pcpu_scope = pcpu_memalloc_scope_save(pcpu_gfp);
> chunk = pcpu_create_chunk(pcpu_gfp);
> + pcpu_memalloc_scope_restore(pcpu_gfp, pcpu_scope);
> +
> if (!chunk) {
> err = "failed to allocate new chunk";
> goto fail;
> @@ -1931,9 +1953,13 @@ void __percpu *pcpu_alloc_noprof(size_t size, size_t align, bool reserved,
> page_end = PFN_UP(off + size);
>
> for_each_clear_bitrange_from(rs, re, chunk->populated, page_end) {
> + unsigned int pcpu_scope;
> +
> WARN_ON(chunk->immutable);
>
> + pcpu_scope = pcpu_memalloc_scope_save(pcpu_gfp);
> ret = pcpu_populate_chunk(chunk, rs, re, pcpu_gfp);
> + pcpu_memalloc_scope_restore(pcpu_gfp, pcpu_scope);
>
> spin_lock_irqsave(&pcpu_lock, flags);
> if (ret) {
> --
> 2.50.1 (Apple Git-155)
>
--
Pedro