[RFC PATCH 03/45] mm: page_alloc: use trylock for PCP lock in free path to avoid lock inversion

From: Rik van Riel

Date: Thu Apr 30 2026 - 16:32:02 EST


From: Rik van Riel <riel@xxxxxxxx>

The per-cpu pageblock buddy allocator changed __free_frozen_pages() and
free_unref_folios() to use a blocking spin_lock_irqsave() for the PCP
lock when in_task(), rather than mainline's unconditional trylock via
pcp_spin_trylock().

This breaks a mainline invariant: the allocation path in rmqueue_pcplist()
acquires pcp->lock via pcp_spin_trylock(), which on SMP does
preempt_disable() + spin_trylock() without disabling IRQs. This means
the alloc path holds pcp->lock with interrupts enabled.

The resulting ABBA deadlock scenario:

CPU0 (alloc path): pcp_spin_trylock() acquires pcp->lock (IRQs ON)
-> hardirq fires while lock is held
-> IRQ handler takes xa_lock
(e.g. __folio_end_writeback -> xa_lock)

CPU1 (free path): xa_lock held (e.g. slab -> stack_depot_free)
-> __free_frozen_pages()
-> spin_lock_irqsave(&pcp->lock) BLOCKS
-> waits for CPU0

CPU0 cannot release pcp->lock because it is stuck in hardirq
waiting for xa_lock held by CPU1. Deadlock.

The key insight is that pcp_trylock_prepare() is a no-op on SMP, so
pcp_spin_trylock() does not save/restore IRQs. Any lock taken in
hardirq context that is also held across __free_frozen_pages() creates
this ABBA potential.

Fix by always using spin_trylock_irqsave() for the PCP lock, falling
back to free_one_page() (zone buddy) when the trylock fails. This
restores the mainline invariant of never blocking on PCP lock acquisition
in the free path.

Signed-off-by: Rik van Riel <riel@xxxxxxxxxxx>
Assisted-by: Claude:claude-opus-4.7 syzkaller
---
mm/page_alloc.c | 16 +++++++++-------
1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c0aa39fa2f61..d98eab3e288e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3262,13 +3262,15 @@ static void __free_frozen_pages(struct page *page, unsigned int order,
cache_cpu = raw_smp_processor_id();

pcp = per_cpu_ptr(zone->per_cpu_pageset, cache_cpu);
- if (unlikely(fpi_flags & FPI_TRYLOCK) || !in_task()) {
- if (!spin_trylock_irqsave(&pcp->lock, UP_flags)) {
- free_one_page(zone, page, pfn, order, fpi_flags);
- return;
- }
- } else {
- spin_lock_irqsave(&pcp->lock, UP_flags);
+ /*
+ * Always use trylock: callers may hold locks (e.g. xa_lock via
+ * slab/stack_depot) that are also taken in hardirq context, and
+ * pcp->lock is acquired with IRQs enabled on the allocation side.
+ * A blocking lock here would create an ABBA deadlock potential.
+ */
+ if (!spin_trylock_irqsave(&pcp->lock, UP_flags)) {
+ free_one_page(zone, page, pfn, order, fpi_flags);
+ return;
}

if (unlikely(pcp->flags & PCPF_CPU_DEAD)) {
--
2.52.0