[PATCH] x86/mm/pat: take cpa_lock around large-page collapse

From: Denis V. Lunev

Date: Fri Jun 26 2026 - 12:39:58 EST


Loading and unloading modules concurrently on several CPUs on a KASAN
build, with a short delay injected at the CPA page-table lookup to
widen the window, faults within minutes:

BUG: KASAN: use-after-free in __change_page_attr+0x7cc/0x7e0
Write of size 8 at addr ffff888181139718 by task modprobe
...
The buggy address belongs to the physical page:
pfn:0x181139 ... page_type: f2(table)

cpa_collapse_large_pages() rebuilds a leaf PMD from its 4K PTEs and
frees the old PTE-table pages, while __change_page_attr() fetches a
PTE pointer from a lockless lookup_address_in_pgd_attr() and writes
it with set_pte_atomic() only later. When module text is served from
a shared large ROX mapping the two run on the same PMD:

CPU A (module load) CPU B (module finalize)
------------------- -----------------------
execmem_make_temp_rw
set_memory_nx
__change_page_attr
split 2M -> 4K table P
kpte = &P[i] (lockless)
execmem_restore_rox
set_memory_rox (CPA_COLLAPSE)
cpa_collapse_large_pages
rebuild leaf PMD
flush_tlb_all
pagetable_free(P)
set_pte_atomic(kpte, ...)
-> writes into freed P

P is a page-table page (page_type: table), reused at once, so the
write corrupts whatever got the page next: a bad-pte or bad-page
splat, or a fatal fault once P has been turned into read-only text.

The flush_tlb_all() before the free does not close this: its IPI only
serializes against page-table walkers that run with interrupts off
(e.g. GUP-fast); the walk in __change_page_attr() runs with interrupts
on, so nothing stops it from holding a stale pointer into P.

Serialize the collapse - the PMD rebuild, TLB flush and PTE-table
free - under cpa_lock, the lock __change_page_attr() takes for the
split path, so a concurrent walker can no longer hold a pointer into
a table the collapse is about to free.

debug_pagealloc bypasses cpa_lock in __change_page_attr() (the direct
map is 4K then, with no large pages to serialize), so the lock cannot
order the two there. Skip the collapse in that config: it is only an
optimization, and not freeing the tables leaves the unserialized walk
nothing to race.

Fixes: 41d88484c71c ("x86/mm/pat: restore large ROX pages after fragmentation")
Signed-off-by: Denis V. Lunev <den@xxxxxxxxxx>
---
arch/x86/mm/pat/set_memory.c | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index d023a40a1e03..ff6e3f612986 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -418,6 +418,16 @@ static void cpa_collapse_large_pages(struct cpa_data *cpa)
int collapsed = 0;
int i;

+ /*
+ * debug_pagealloc bypasses cpa_lock, so __change_page_attr() walks
+ * unserialized and freeing collapsed PTE-tables could race it; skip
+ * the optional merge there.
+ */
+ if (debug_pagealloc_enabled())
+ return;
+
+ spin_lock(&cpa_lock);
+
if (cpa->flags & (CPA_PAGES_ARRAY | CPA_ARRAY)) {
for (i = 0; i < cpa->numpages; i++)
collapsed += collapse_large_pages(__cpa_addr(cpa, i),
@@ -431,8 +441,10 @@ static void cpa_collapse_large_pages(struct cpa_data *cpa)
collapsed += collapse_large_pages(addr, &pgtables);
}

- if (!collapsed)
+ if (!collapsed) {
+ spin_unlock(&cpa_lock);
return;
+ }

flush_tlb_all();

@@ -440,6 +452,8 @@ static void cpa_collapse_large_pages(struct cpa_data *cpa)
list_del(&ptdesc->pt_list);
pagetable_free(ptdesc);
}
+
+ spin_unlock(&cpa_lock);
}

static void cpa_flush(struct cpa_data *cpa, int cache)

base-commit: b81d185839fade27f7c4e885856696cf497d53c1
--
2.53.0