[RFC PATCH 27/40] mm: page_alloc: cross-migratetype buddy borrow within tainted SPBs

From: Rik van Riel

Date: Wed May 20 2026 - 11:20:39 EST


When pages are freed via __free_one_page they're placed on the
per-SPB free_list determined by their pageblock's migratetype, not
the original allocation's migratetype. Slab-heavy and cache-heavy
workloads both expose structural mismatches that leave non-movable
allocations stranded:

- RECLAIMABLE pageblocks fill up densely with live slab objects
(e.g. btrfs_inode caches), leaving very few sub-pageblock free
fragments on the RECL free list.
- UNMOVABLE pageblocks accumulate sparse free space from vmalloc
and raw-alloc churn -- tens of thousands of free pages, all
on the UNMOV free list.
- MOVABLE-tagged pageblocks in tainted SPBs absorb freed
page-cache and anon-LRU pages, accumulating large pools all on
the MOVABLE free list -- invisible to non-movable demand even
though the tainted SPB has plenty of unused space.

Add two new passes between Pass 2b and Pass 3 of __rmqueue_smallest,
both restricted to SB_TAINTED (clean SPBs must not be polluted with
cross-type mixing) and both purely transient borrows (no pageblock
relabel; the borrowed page returns to its source list when freed):

Pass 2c -- cross-non-movable borrow. UNMOV alloc tries the
RECL free list; RECL alloc tries the UNMOV free list. Restricted
to UNMOV <-> RECL.

Pass 2d -- cross-MOV borrow. Non-movable alloc tries the
MOVABLE free list of a tainted SPB. Tradeoff: the borrowed
UNMOV/RECL content blocks compaction of its source pageblock
until freed; restricted to SB_TAINTED so contamination is bounded
to one pageblock inside an already-tainted SPB. The alternative
-- Pass 3 tainting a fresh clean SPB -- removes a 1 GiB region
from the clean pool, which is strictly worse for the anti-
fragmentation invariant the series is built around.

PB_has_<requested_type> is set via __spb_set_has_type so spb_defrag
accounting reflects that the pageblock now hosts our type's
content. PB_has_<source_type> stays set since other buddies of
that type remain.

Movable allocations don't participate (they have Pass 4) and CMA
is skipped. Observable as SPB_ALLOC_OUTCOME_PASS_2C and
SPB_ALLOC_OUTCOME_PASS_2D on the spb_alloc_walk tracepoint.

Live measurement on a 250 GB system with btrfs root
(Stage 1 + simplified Stage 2a) at boot+7min: 12 tainted Normal-
zone SPBs grew from 4 baseline despite the existing 11 having
between 825 and 87,062 free pages each, ALL on the UNMOV list
while the workload kept allocating RECL btrfs_inode slab pages.
Pass 2c lets those allocs absorb into the existing UNMOV-listed
free pool rather than creating fresh tainted SPBs; Pass 2d
extends the same idea to the MOV-listed free pool that page-
cache reclaim leaves behind.

Signed-off-by: Rik van Riel <riel@xxxxxxxxxxx>
Assisted-by: Claude:claude-opus-4.7 syzkaller
---
mm/page_alloc.c | 156 ++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 156 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e4ecddb428c3..ce8cd99dd283 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2820,6 +2820,7 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
struct page *page;
int full;
struct superpageblock *sb;
+ int opposite_mt;
/*
* Category search order: 2 passes.
* Movable: clean first, then tainted (pack into clean SBs).
@@ -2999,6 +3000,161 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
}
}
}
+
+ /*
+ * Pass 2c: cross-non-movable borrow within tainted SPBs.
+ *
+ * If we're a non-movable alloc and Pass 1/2/2b couldn't find a
+ * buddy on our migratetype's free list anywhere, but tainted
+ * SPBs have free buddies on the *opposite* non-movable type's
+ * free list, take one of those.
+ *
+ * Why this happens: when pages are freed, __free_one_page puts
+ * them on the free_list determined by their pageblock's tag,
+ * not the original allocation's migratetype. Slab caches tend
+ * to be dense (RECL pageblocks fill up; few sub-PB fragments),
+ * while UNMOV pageblocks accumulate sparse free space from
+ * vmalloc/raw alloc churn. Net effect: tainted SPBs frequently
+ * have tens of thousands of free pages all on the UNMOV list,
+ * invisible to RECL allocs (or vice versa). Without this pass,
+ * the alloc falls through to Pass 3 and taints a fresh clean
+ * SPB even though the existing tainted ones have plenty of
+ * unused space.
+ *
+ * We do NOT relabel the source pageblock. The buddy is taken
+ * from @opposite_mt's free list and the splits go back on
+ * @opposite_mt's list (page_del_and_expand uses the same mt
+ * for delete and expand). The pageblock tag is unchanged, so
+ * the page returns to @opposite_mt's list when freed via
+ * __free_one_page. Effectively a borrow: the alloc takes a
+ * physical page from a UNMOV-tagged pageblock for a RECL
+ * use, and the page cycles back to UNMOV's list on free.
+ *
+ * We do set PB_has_<migratetype> via __spb_set_has_type so
+ * spb_defrag accounting reflects that this pageblock now hosts
+ * our migratetype's content too. PB_has_<opposite_mt> stays
+ * set since other buddies of that type remain.
+ *
+ * Restricted to UNMOV ↔ RECL. Movable allocations don't
+ * participate (they have their own Pass 4 fallback path).
+ *
+ * Restricted to SB_TAINTED to avoid spreading mixing into
+ * clean SPBs.
+ */
+ opposite_mt = -1;
+ if (migratetype == MIGRATE_UNMOVABLE)
+ opposite_mt = MIGRATE_RECLAIMABLE;
+ else if (migratetype == MIGRATE_RECLAIMABLE)
+ opposite_mt = MIGRATE_UNMOVABLE;
+
+ if (opposite_mt >= 0) {
+ for (full = SB_FULL; full < __NR_SB_FULLNESS; full++) {
+ list_for_each_entry(sb,
+ &zone->spb_lists[SB_TAINTED][full], list) {
+ int co;
+
+ if (!sb->nr_free_pages)
+ continue;
+ for (co = min_t(int, pageblock_order - 1,
+ NR_PAGE_ORDERS - 1);
+ co >= (int)order;
+ --co) {
+ current_order = co;
+ area = &sb->free_area[current_order];
+ page = get_page_from_free_area(
+ area, opposite_mt);
+ if (!page)
+ continue;
+ if (get_pageblock_isolate(page))
+ continue;
+ if (is_migrate_cma(
+ get_pageblock_migratetype(page)))
+ continue;
+ page_del_and_expand(zone, page,
+ order, current_order,
+ opposite_mt);
+ __spb_set_has_type(page,
+ migratetype);
+ trace_mm_page_alloc_zone_locked(
+ page, order, migratetype,
+ pcp_allowed_order(order) &&
+ migratetype < MIGRATE_PCPTYPES);
+ return page;
+ }
+ }
+ }
+ }
+
+ /*
+ * Pass 2d: cross-MOV borrow within tainted SPBs.
+ *
+ * If Pass 1/2/2b/2c all failed, the next step is Pass 3
+ * which would taint a fresh clean SPB. Before that, try
+ * to borrow an individual buddy from a tainted SPB's
+ * MIGRATE_MOVABLE free list.
+ *
+ * Tainted SPBs accumulate large amounts of free space on
+ * the MOV free list (e.g. reclaimed page-cache pages
+ * whose pageblock tag is MOVABLE). Pass 1 cannot see
+ * those for non-movable allocs, Pass 2/2b cannot claim a
+ * whole pageblock when sb->nr_free == 0, and Pass 2c is
+ * restricted to UNMOV<->RECL. The result is a tainted
+ * SPB with tens to hundreds of thousands of free pages
+ * all unreachable from non-movable demand.
+ *
+ * Borrow semantics mirror Pass 2c: take a buddy from the
+ * MOVABLE free list without relabeling the source
+ * pageblock. The page is used for the requesting non-
+ * movable mt for the lifetime of the allocation, then on
+ * free returns to the MOVABLE list.
+ *
+ * Cost: the borrowed UNMOV/RECL content blocks
+ * compaction of its source pageblock until freed.
+ * Restricted to SB_TAINTED so the contamination is
+ * bounded to an already-tainted SPB; the alternative
+ * (Pass 3) taints a fresh clean SPB and removes a 1 GiB
+ * region from the clean pool, which is strictly worse.
+ *
+ * Skipped for movable allocs (they have Pass 4) and for
+ * CMA allocs.
+ */
+ if (!movable && !is_migrate_cma(migratetype)) {
+ for (full = SB_FULL; full < __NR_SB_FULLNESS; full++) {
+ list_for_each_entry(sb,
+ &zone->spb_lists[SB_TAINTED][full], list) {
+ int co;
+
+ if (!sb->nr_free_pages)
+ continue;
+ for (co = min_t(int, pageblock_order - 1,
+ NR_PAGE_ORDERS - 1);
+ co >= (int)order;
+ --co) {
+ current_order = co;
+ area = &sb->free_area[current_order];
+ page = get_page_from_free_area(
+ area, MIGRATE_MOVABLE);
+ if (!page)
+ continue;
+ if (get_pageblock_isolate(page))
+ continue;
+ if (is_migrate_cma(
+ get_pageblock_migratetype(page)))
+ continue;
+ page_del_and_expand(zone, page,
+ order, current_order,
+ MIGRATE_MOVABLE);
+ __spb_set_has_type(page,
+ migratetype);
+ trace_mm_page_alloc_zone_locked(
+ page, order, migratetype,
+ pcp_allowed_order(order) &&
+ migratetype < MIGRATE_PCPTYPES);
+ return page;
+ }
+ }
+ }
+ }
}

/*
--
2.54.0