Re: [PATCH 1/2] mm/percpu: Preserve NOFS/NOIO scope during chunk create and populate
From: Pedro Falcato
Date: Fri May 29 2026 - 05:41:55 EST
On Fri, May 29, 2026 at 10:25:28AM +0100, Pedro Falcato wrote:
> On Thu, May 28, 2026 at 09:29:16PM +0800, Kaitao Cheng wrote:
> > From: Kaitao Cheng <chengkaitao@xxxxxxxxxx>
> >
> > pcpu_alloc_noprof() derives pcpu_gfp from the caller supplied GFP mask and
> > passes it to the backing percpu allocators. This preserves GFP_NOFS and
> > GFP_NOIO for pcpu_alloc_pages() and for the initial pcpu_chunk allocation.
> >
> > However, the chunk creation and population slow paths also call helpers
> > which do not take a GFP mask and perform internal allocations with
> > GFP_KERNEL. For example, pcpu_create_chunk() calls pcpu_get_vm_areas(),
> > and population can allocate temporary metadata or page tables while mapping
> > backing pages. As a result, a caller which explicitly uses GFP_NOFS or
> > GFP_NOIO can still enter FS or IO reclaim while creating or populating a
> > percpu chunk.
> >
> > This is problematic for callers which use GFP_NOFS or GFP_NOIO because
> > they are already holding filesystem or IO-path locks. If free chunks are
> > exhausted, the percpu allocation can take pcpu_alloc_mutex and then enter
> > unconstrained reclaim from these internal allocations, defeating the
> > caller's allocation context and potentially recreating reclaim lock
> > dependencies.
> >
> > Wrap chunk creation and population in a scoped NOIO or NOFS context when
> > pcpu_gfp has the corresponding constraints. Leave ordinary GFP_KERNEL
> > allocations unchanged so they retain full reclaim capability.
> >
> > Fixes: 9a5b183941b5 ("mm, percpu: do not consider sleepable allocations atomic")
>
> I assume you _did not_ observe this in production? As in no reclaim path should be
> insane^W daring enough to do pcpu allocations?
Oops, I mixed my issues up. This is purely a GFP flags issue. A quick
"git grep alloc_percpu_gfp" shows that the vast majority (all?) callers are
using some combination of GFP_KERNEL or GFP_ATOMIC + other GFP flags, but no
NOFS or NOIO as far as I can see. So you probably did not observe this?
>
> > Signed-off-by: Kaitao Cheng <chengkaitao@xxxxxxxxxx>
> > ---
> > mm/percpu.c | 26 ++++++++++++++++++++++++++
> > 1 file changed, 26 insertions(+)
> >
> > diff --git a/mm/percpu.c b/mm/percpu.c
> > index 71a85d7245c7..1bb38467390b 100644
> > --- a/mm/percpu.c
> > +++ b/mm/percpu.c
> > @@ -1778,6 +1778,23 @@ static void pcpu_alloc_tag_free_hook(struct pcpu_chunk *chunk, int off, size_t s
> > }
> > #endif
> >
> > +static unsigned int pcpu_memalloc_scope_save(gfp_t gfp)
> > +{
> > + if (!(gfp & __GFP_IO))
> > + return memalloc_noio_save();
> > + if (!(gfp & __GFP_FS))
> > + return memalloc_nofs_save();
> > + return 0;
> > +}
> > +
> > +static void pcpu_memalloc_scope_restore(gfp_t gfp, unsigned int flags)
> > +{
> > + if (!(gfp & __GFP_IO))
> > + memalloc_noio_restore(flags);
> > + else if (!(gfp & __GFP_FS))
> > + memalloc_nofs_restore(flags);
> > +}
>
> I disagree with this. We already have gfp flags, they're already passed to pcpu_create_chunk()
> and pcpu_populate_chunk(). It's their job to respect the gfp flags and
> Do The Right Thing(tm). Can you fix the problematic places? It seems like it's
> mostly the vmalloc backend that's problematic.
>
> > +
> > /**
> > * pcpu_alloc - the percpu allocator
> > * @size: size of area to allocate in bytes
> > @@ -1901,7 +1918,12 @@ void __percpu *pcpu_alloc_noprof(size_t size, size_t align, bool reserved,
> >
> > /* No space left. Create a new chunk. */
> > if (list_empty(&pcpu_chunk_lists[pcpu_free_slot])) {
> > + unsigned int pcpu_scope;
> > +
> > + pcpu_scope = pcpu_memalloc_scope_save(pcpu_gfp);
> > chunk = pcpu_create_chunk(pcpu_gfp);
> > + pcpu_memalloc_scope_restore(pcpu_gfp, pcpu_scope);
> > +
> > if (!chunk) {
> > err = "failed to allocate new chunk";
> > goto fail;
> > @@ -1931,9 +1953,13 @@ void __percpu *pcpu_alloc_noprof(size_t size, size_t align, bool reserved,
> > page_end = PFN_UP(off + size);
> >
> > for_each_clear_bitrange_from(rs, re, chunk->populated, page_end) {
> > + unsigned int pcpu_scope;
> > +
> > WARN_ON(chunk->immutable);
> >
> > + pcpu_scope = pcpu_memalloc_scope_save(pcpu_gfp);
> > ret = pcpu_populate_chunk(chunk, rs, re, pcpu_gfp);
> > + pcpu_memalloc_scope_restore(pcpu_gfp, pcpu_scope);
> >
> > spin_lock_irqsave(&pcpu_lock, flags);
> > if (ret) {
> > --
> > 2.50.1 (Apple Git-155)
> >
>
> --
> Pedro
--
Pedro