Re: [PATCH] mm/page_alloc: skip high atomic reservation at or below costly order

From: JP Kobryn

Date: Tue Jun 16 2026 - 15:59:26 EST

On 5/28/26 6:57 AM, Vlastimil Babka (SUSE) wrote:
> On 5/27/26 07:57, JP Kobryn wrote:
>> On 5/25/26 2:11 AM, Vlastimil Babka (SUSE) wrote:
>>> On 5/19/26 22:28, Johannes Weiner wrote:
>>>> On Mon, May 18, 2026 at 06:25:32PM -0700, JP Kobryn (Meta) wrote:
>>>> This is an interesting patch. A couple of thoughts:
>>>>
>>>> 1. You disabled the highatomic reserve for this workload and it didn't
>>>> seem to matter. Presumably <costly orders don't need the protection.
>>>>
>>>> 2. Maxing out the reserves is odd. ALLOC_HIGHATOMIC allocations will
>>>> try reserved space first,
>>> Hmm, but if the allocation succeeds before entering slowpath,
>>> ALLOC_NON_BLOCK won't be set.
>>> But reserving another block should mean we already exhausted the
>>> reserved ones.
>>> Unreserving is only done when direct reclaim made some progress but failed
>>> to produce a page. But if it works, or kswapd does the job, we won't
>>> enter it?
>>
>> There was just no real pressure to invoke the unreserving. Let me know
>> if I'm misunderstanding the question.
>
> Sorry, it was more thinking out loud about Johannes' point than a question.
> Yeah it seems there was no real pressure to invoke unreserving.
>
> The reserving side is probably fine. Highatomic allocation will not try the
> already reserved blocks in he fastpath, which is maybe not ideal. But they
> will try them before reserving another block, and that's the important part.
>

I sent a patch [0] that addresses this.

>>>> and I'd expect things that are commonly
>>>> highatomic to be short-lived. Why don't we stop with a couple of
>>>> claimed highatomic blocks that get continuously recycled?
>>> Maybe it's some big burst of highatomic allocations that leads to the
>>> reservations and then they stay around "forever"?
>>
>> I should add to the changelog the missing info that high frequency
>> net allocations are responsible for these high atomic reservations.
>> Even though the allocations are not necessarily long-lived, the
>> pageblocks remain high atomic.
>
> OK, thanks for the info.
>
>>> If that's the case I think we should be perhaps looking at the unreserving
>>> being done more proactively, rather than limiting things to costly order.
>>
>> What are your thoughts if we instead look at it as: should we be reserving
>> full pageblocks for small allocations?
>
> Well, since migratetypes operate on the pageblock level, so do the
> highatomic reservations. It at least groups them together and not scatter
> all over random pageblocks?

Right, that's the trade-off. I'm not going to pursue this approach.
Instead, I've been looking for a more targeted fix in the relevant
allocator paths. See this patch [0].

>
>> It seems to come down to whether we want the disproportionate protection
>> of full
>> pageblocks (below costly order) for high atomic allocs vs letting them
>> coalesce
>> in the buddy path. Is the data not enough to justify the latter?
>
> I still think the data shows we might be too lax in unreserving.

Ack.

>
>>>> 3. The impact on THP and compaction success rate is pretty
>>>> extreme. How can 1% of memory throw such a wrench into the gears?
>>> Maybe if ~all free memory is in the highatomic blocks, compaction can't be
>>> effective much. Or some suitability check somewhere in reclaim+compaction
>>> wrongly assumes the highatomic blocks are usable, so it won't do the work.
>>
>> I could be missing something, but I spent some time tonight looking into
>> this and didn't find an issue in the compaction/reclaim suitability path.
>>
>> __compaction_suitable() calls __zone_watermark_ok(), and that path
>> subtracts free MIGRATE_HIGHATOMIC pages from usable free memory for
>> callers without reserve access:
>>
>> /*
>> * If the caller does not have rights to reserves below the min
>> * watermark then subtract the free pages reserved for highatomic.
>> */
>> if (likely(!(alloc_flags & ALLOC_RESERVES)))
>> unusable_free += READ_ONCE(z->nr_free_highatomic);
>>
>> So free highatomic pages are removed from the usable free count there.
>>
>> Also, the suitable-free-block check in __zone_watermark_ok() only treats
>> MIGRATE_HIGHATOMIC as usable when alloc_flags includes
>> ALLOC_HIGHATOMIC (or ALLOC_OOM). __compaction_suitable() passes
>> ALLOC_CMA here (not ALLOC_HIGHATOMIC), so I don't think compaction is
>> incorrectly treating free highatomic blocks as usable.
>
> OK, thanks for checking.
>
>> The only caveat I noticed is the fragmentation accounting side:
>> fill_contig_page_info() / fragmentation_index() appear to count
>> free_area[order].nr_free across migratetypes, so fragmentation scoring
>> may look better than they really are. But that seems adjacent
>> to this patch.
>
> Right.
>
>> I think though that by the time we consider reclaim or compaction we're
>> dealing with the aftermath. The patch prevents the problem from occurring
>> up front.
>
> But I think as a result the highatomic feature is effectively dead. Your
> results confirm there are no more Highatomic pageblocks and zero Atomic
> order-4+ allocations (actually it's weird there's still 1 highatomic
> pageblock with zero allocations that would reserve it, or is that a rounding
> error due to calculating average across multiple hosts?).

Likely a rounding issue.

>
> I think it's not a surprise that there are no costly highatomic allocation
> attempts, we've always said they are too easy to fail, so likely nobody even
> tries them. MIGRATE_HIGHATOMIC was introduced by Mel [1] and evaluated on
> order-1. Even the non-costly orders can fail of course and should have
> fallbacks, highatomic reserves are just supposed to make the success more
> likely as that improves e.g. the networking receive performance, and they do
> use non-costly orders.
>
> Did you observe no increase of net receive fallbacks due to this patch?
> Would that be an universal outcome? I.e. did highatomic reservations become
> obsolete thanks to other improvements to the page allocator since they were
> introduced? That would be great as we could remove it completely and
> simplify the code, but we don't know that yet.

See the separate patch [0] which takes a targeted approach on the
allocator path. It accounts for net fallbacks and should help napi/page
frag allocs in the fastpath.

>
> If there are still benefits, they probably should stay, but that means keep
> them working for non-costly orders, and we should fix the observed problems
> differently. I can see two directions to try in that order.

Ack.

>
> - You say there are "high frequency net allocations" so I assume they are
> ongoing. We could try modify the fastpath __alloc_frozen_pages_noprof() to
> properly evaluate ALLOC_HIGHATOMIC and let them prefer the reserved blocks
> in cases that do not end up in __alloc_pages_slowpath(). This should ensure
> the reserved blocks are actually being used even if we are above low
> watermarks and don't enter the slowpath.

Yes, this can be seen in the separate patch [0].

>
> - If that doesn't help and we still have unused highatomic pageblocks,
> figure out how that happens - is the highatomic allocation frequency higher
> at some point, resulting in their increase, and then it drops and they stay
> around? If yes, think about how to make the unreserving more aggressive than
> it currently is.
>
> [1]
> https://lore.kernel.org/all/1442832762-7247-10-git-send-email-mgorman@xxxxxxxxxxxxxxxxxxx/
>

The patch below improves the allocator path. I'll explore opportunities
for unreserving.

[0] https://lore.kernel.org/all/20260616191420.52556-1-jp.kobryn@xxxxxxxxx/