[PATCH 1/2] mm/percpu: Preserve NOFS/NOIO scope during chunk create and populate

From: Kaitao Cheng

Date: Thu May 28 2026 - 09:36:30 EST


From: Kaitao Cheng <chengkaitao@xxxxxxxxxx>

pcpu_alloc_noprof() derives pcpu_gfp from the caller supplied GFP mask and
passes it to the backing percpu allocators. This preserves GFP_NOFS and
GFP_NOIO for pcpu_alloc_pages() and for the initial pcpu_chunk allocation.

However, the chunk creation and population slow paths also call helpers
which do not take a GFP mask and perform internal allocations with
GFP_KERNEL. For example, pcpu_create_chunk() calls pcpu_get_vm_areas(),
and population can allocate temporary metadata or page tables while mapping
backing pages. As a result, a caller which explicitly uses GFP_NOFS or
GFP_NOIO can still enter FS or IO reclaim while creating or populating a
percpu chunk.

This is problematic for callers which use GFP_NOFS or GFP_NOIO because
they are already holding filesystem or IO-path locks. If free chunks are
exhausted, the percpu allocation can take pcpu_alloc_mutex and then enter
unconstrained reclaim from these internal allocations, defeating the
caller's allocation context and potentially recreating reclaim lock
dependencies.

Wrap chunk creation and population in a scoped NOIO or NOFS context when
pcpu_gfp has the corresponding constraints. Leave ordinary GFP_KERNEL
allocations unchanged so they retain full reclaim capability.

Fixes: 9a5b183941b5 ("mm, percpu: do not consider sleepable allocations atomic")
Signed-off-by: Kaitao Cheng <chengkaitao@xxxxxxxxxx>
---
mm/percpu.c | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)

diff --git a/mm/percpu.c b/mm/percpu.c
index 71a85d7245c7..1bb38467390b 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -1778,6 +1778,23 @@ static void pcpu_alloc_tag_free_hook(struct pcpu_chunk *chunk, int off, size_t s
}
#endif

+static unsigned int pcpu_memalloc_scope_save(gfp_t gfp)
+{
+ if (!(gfp & __GFP_IO))
+ return memalloc_noio_save();
+ if (!(gfp & __GFP_FS))
+ return memalloc_nofs_save();
+ return 0;
+}
+
+static void pcpu_memalloc_scope_restore(gfp_t gfp, unsigned int flags)
+{
+ if (!(gfp & __GFP_IO))
+ memalloc_noio_restore(flags);
+ else if (!(gfp & __GFP_FS))
+ memalloc_nofs_restore(flags);
+}
+
/**
* pcpu_alloc - the percpu allocator
* @size: size of area to allocate in bytes
@@ -1901,7 +1918,12 @@ void __percpu *pcpu_alloc_noprof(size_t size, size_t align, bool reserved,

/* No space left. Create a new chunk. */
if (list_empty(&pcpu_chunk_lists[pcpu_free_slot])) {
+ unsigned int pcpu_scope;
+
+ pcpu_scope = pcpu_memalloc_scope_save(pcpu_gfp);
chunk = pcpu_create_chunk(pcpu_gfp);
+ pcpu_memalloc_scope_restore(pcpu_gfp, pcpu_scope);
+
if (!chunk) {
err = "failed to allocate new chunk";
goto fail;
@@ -1931,9 +1953,13 @@ void __percpu *pcpu_alloc_noprof(size_t size, size_t align, bool reserved,
page_end = PFN_UP(off + size);

for_each_clear_bitrange_from(rs, re, chunk->populated, page_end) {
+ unsigned int pcpu_scope;
+
WARN_ON(chunk->immutable);

+ pcpu_scope = pcpu_memalloc_scope_save(pcpu_gfp);
ret = pcpu_populate_chunk(chunk, rs, re, pcpu_gfp);
+ pcpu_memalloc_scope_restore(pcpu_gfp, pcpu_scope);

spin_lock_irqsave(&pcpu_lock, flags);
if (ret) {
--
2.50.1 (Apple Git-155)