Re: [PATCH] mm/page_alloc: skip high atomic reservation at or below costly order
From: Andrew Morton
Date: Tue May 19 2026 - 15:28:00 EST
On Mon, 18 May 2026 18:25:32 -0700 "JP Kobryn (Meta)" <jp.kobryn@xxxxxxxxx> wrote:
> We're seeing a pattern in production where 2MB THP order-9 allocations are
> failing due to fragmentation and triggering reclaim on systems with plenty
> of free memory. Over time, the success rate of these THP allocations do not
> increase at all.
>
> Inspecting zone->vm_stat[NR_FREE_PAGES] via kprobe on compaction_suitable()
> indicated the given zone had sufficient free pages for order-9 allocations,
> yet they were going unused. Drilling down into the zone and inspecting
> /proc/pagetypeinfo revealed why. Order-9 blocks were accumulating in the
> zone's HighAtomic bucket (while zero were present in Movable). THP is
> unable to draw blocks from HighAtomic since that bucket is not in the
> fallback list.
>
> The heuristic for reserving pageblocks in HighAtomic is that any atomic
> allocation greater than order-0 will result in the full pageblock being
> captured. This means that an order-1 atomic allocation will over-reserve by
> 256x, a full 512 pageblock.
>
> Gate the reservation on order. Skip for allocations at or below
> PAGE_ALLOC_COSTLY_ORDER. This prevents smaller atomic allocations from
> reserving entire pageblocks, and significantly helps when THP is in use on
> a fragmented but otherwise healthy system.
>
> Testing was performed using an A/B instagram workload receiving prod
> traffic. Each side had ~60 hosts with 64G memory. The patch resulted in
> several gains:
>
> Unpatched
> HighAtomic pageblocks per host: 309-312 (1% of zone or 620MB),
> ...all order-9 blocks in HighAtomic
> THP success rate: 1-6%
> Compaction success rate: 0-2%
> pgscan_kswapd (total across ~60 hosts, per minute): ~70.2M
> Atomic order-4+ allocations: 0
>
> Patched
> HighAtomic pageblocks per host: 1
> THP success rate: 44-78%
> Compaction success rate: 24-47%
> pgscan_kswapd (total across ~60 hosts, per minute): ~29.9M
> Atomic order-4+ allocations: 0
>
> Note that for this workload all atomic allocations were order 0-3
> originating from the network stack, btrfs, and scheduler.
>
> ...
>
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3446,6 +3446,13 @@ static void reserve_highatomic_pageblock(struct page *page, int order,
> int mt;
> unsigned long max_managed;
>
> + /*
> + * Don't reserve a pageblock for lower orders.
> + * Order 1-3 allocs should not capture a huge page size block.
> + */
> + if (order <= PAGE_ALLOC_COSTLY_ORDER)
> + return;
> +
> /*
> * The number reserved as: minimum is 1 pageblock, maximum is
> * roughly 1% of a zone. But if 1% of a zone falls below a
Sashiko asked
: Does skipping the HighAtomic reservation for orders 1-3 break the
: anti-fragmentation guarantees for these atomic allocations?
:
: The MIGRATE_HIGHATOMIC reserve protects high-order atomic allocations
: from failing under fragmentation by taking ownership of the entire
: pageblock.
:
: If order-1 through order-3 atomic allocations fall back to stealing
: pages, but the pageblock remains in its original migratetype, won't
: order-0 non-atomic allocations consume the remaining contiguous space?
:
: Under memory pressure, this could leave no contiguous blocks for atomic
: allocations to steal. Because these atomic allocations cannot trigger
: direct reclaim or compaction, they might fail, potentially leading to
: dropped packets or I/O errors in subsystems like the network stack or
: BTRFS.
:
: Could background compaction or khugepaged be used to unreserve
: HighAtomic blocks dynamically instead of disabling the reserve for
: these orders?