Re: [PATCH] mm/page_alloc: skip high atomic reservation at or below costly order
From: JP Kobryn
Date: Tue May 26 2026 - 22:35:34 EST
On 5/19/26 1:28 PM, Johannes Weiner wrote:
On Mon, May 18, 2026 at 06:25:32PM -0700, JP Kobryn (Meta) wrote:Right. Although one detail I realize is I should also consider
We're seeing a pattern in production where 2MB THP order-9 allocations areThis is an interesting patch. A couple of thoughts:
failing due to fragmentation and triggering reclaim on systems with plenty
of free memory. Over time, the success rate of these THP allocations do not
increase at all.
Inspecting zone->vm_stat[NR_FREE_PAGES] via kprobe on compaction_suitable()
indicated the given zone had sufficient free pages for order-9 allocations,
yet they were going unused. Drilling down into the zone and inspecting
/proc/pagetypeinfo revealed why. Order-9 blocks were accumulating in the
zone's HighAtomic bucket (while zero were present in Movable). THP is
unable to draw blocks from HighAtomic since that bucket is not in the
fallback list.
The heuristic for reserving pageblocks in HighAtomic is that any atomic
allocation greater than order-0 will result in the full pageblock being
captured. This means that an order-1 atomic allocation will over-reserve by
256x, a full 512 pageblock.
Gate the reservation on order. Skip for allocations at or below
PAGE_ALLOC_COSTLY_ORDER. This prevents smaller atomic allocations from
reserving entire pageblocks, and significantly helps when THP is in use on
a fragmented but otherwise healthy system.
Testing was performed using an A/B instagram workload receiving prod
traffic. Each side had ~60 hosts with 64G memory. The patch resulted in
several gains:
Unpatched
HighAtomic pageblocks per host: 309-312 (1% of zone or 620MB),
...all order-9 blocks in HighAtomic
THP success rate: 1-6%
Compaction success rate: 0-2%
pgscan_kswapd (total across ~60 hosts, per minute): ~70.2M
Atomic order-4+ allocations: 0
Patched
HighAtomic pageblocks per host: 1
THP success rate: 44-78%
Compaction success rate: 24-47%
pgscan_kswapd (total across ~60 hosts, per minute): ~29.9M
Atomic order-4+ allocations: 0
1. You disabled the highatomic reserve for this workload and it didn't
seem to matter. Presumably <costly orders don't need the protection.
pageblock_order as well to avoid any config issue.
2. Maxing out the reserves is odd. ALLOC_HIGHATOMIC allocations will
try reserved space first, and I'd expect things that are commonly
highatomic to be short-lived. Why don't we stop with a couple of
claimed highatomic blocks that get continuously recycled?
Even though they may be short-lived, the data shows the volume of
allocations is steady enough to keep the reserves maxed out.
3. The impact on THP and compaction success rate is prettyLooking at the pre-patched high atomic pageblock counts, that's ~300
extreme. How can 1% of memory throw such a wrench into the gears?
pageblocks that could've been used for THPs. They become usable after
the patch.
Have you tried this with other workloads?No, but the pre-patch symptoms will show up on workloads where net
allocs are frequent enough to keep the high atomic pageblock count up.
Memory size of hosts involved is a factor as well since it's possible for a
majority of order-9 pages to be stuck in high atomic.