Re: [PATCH] mm/page_alloc: skip high atomic reservation at or below costly order

From: Vlastimil Babka (SUSE)

Date: Thu May 28 2026 - 10:03:55 EST

On 5/27/26 07:57, JP Kobryn wrote:
> On 5/25/26 2:11 AM, Vlastimil Babka (SUSE) wrote:
>> On 5/19/26 22:28, Johannes Weiner wrote:
>>> On Mon, May 18, 2026 at 06:25:32PM -0700, JP Kobryn (Meta) wrote:
>>> This is an interesting patch. A couple of thoughts:
>>>
>>> 1. You disabled the highatomic reserve for this workload and it didn't
>>> seem to matter. Presumably <costly orders don't need the protection.
>>>
>>> 2. Maxing out the reserves is odd. ALLOC_HIGHATOMIC allocations will
>>> try reserved space first,
>> Hmm, but if the allocation succeeds before entering slowpath,
>> ALLOC_NON_BLOCK won't be set.
>> But reserving another block should mean we already exhausted the
>> reserved ones.
>> Unreserving is only done when direct reclaim made some progress but failed
>> to produce a page. But if it works, or kswapd does the job, we won't
>> enter it?
>
> There was just no real pressure to invoke the unreserving. Let me know
> if I'm misunderstanding the question.

Sorry, it was more thinking out loud about Johannes' point than a question.
Yeah it seems there was no real pressure to invoke unreserving.

The reserving side is probably fine. Highatomic allocation will not try the
already reserved blocks in he fastpath, which is maybe not ideal. But they
will try them before reserving another block, and that's the important part.

>>> and I'd expect things that are commonly
>>> highatomic to be short-lived. Why don't we stop with a couple of
>>> claimed highatomic blocks that get continuously recycled?
>> Maybe it's some big burst of highatomic allocations that leads to the
>> reservations and then they stay around "forever"?
>
> I should add to the changelog the missing info that high frequency
> net allocations are responsible for these high atomic reservations.
> Even though the allocations are not necessarily long-lived, the
> pageblocks remain high atomic.

OK, thanks for the info.

>> If that's the case I think we should be perhaps looking at the unreserving
>> being done more proactively, rather than limiting things to costly order.
>
> What are your thoughts if we instead look at it as: should we be reserving
> full pageblocks for small allocations?

Well, since migratetypes operate on the pageblock level, so do the
highatomic reservations. It at least groups them together and not scatter
all over random pageblocks?

> It seems to come down to whether we want the disproportionate protection
> of full
> pageblocks (below costly order) for high atomic allocs vs letting them
> coalesce
> in the buddy path. Is the data not enough to justify the latter?

I still think the data shows we might be too lax in unreserving.

>>> 3. The impact on THP and compaction success rate is pretty
>>> extreme. How can 1% of memory throw such a wrench into the gears?
>> Maybe if ~all free memory is in the highatomic blocks, compaction can't be
>> effective much. Or some suitability check somewhere in reclaim+compaction
>> wrongly assumes the highatomic blocks are usable, so it won't do the work.
>
> I could be missing something, but I spent some time tonight looking into
> this and didn't find an issue in the compaction/reclaim suitability path.
>
> __compaction_suitable() calls __zone_watermark_ok(), and that path
> subtracts free MIGRATE_HIGHATOMIC pages from usable free memory for
> callers without reserve access:
>
> /*
> * If the caller does not have rights to reserves below the min
> * watermark then subtract the free pages reserved for highatomic.
> */
> if (likely(!(alloc_flags & ALLOC_RESERVES)))
> unusable_free += READ_ONCE(z->nr_free_highatomic);
>
> So free highatomic pages are removed from the usable free count there.
>
> Also, the suitable-free-block check in __zone_watermark_ok() only treats
> MIGRATE_HIGHATOMIC as usable when alloc_flags includes
> ALLOC_HIGHATOMIC (or ALLOC_OOM). __compaction_suitable() passes
> ALLOC_CMA here (not ALLOC_HIGHATOMIC), so I don't think compaction is
> incorrectly treating free highatomic blocks as usable.

OK, thanks for checking.

> The only caveat I noticed is the fragmentation accounting side:
> fill_contig_page_info() / fragmentation_index() appear to count
> free_area[order].nr_free across migratetypes, so fragmentation scoring
> may look better than they really are. But that seems adjacent
> to this patch.

Right.

> I think though that by the time we consider reclaim or compaction we're
> dealing with the aftermath. The patch prevents the problem from occurring
> up front.

But I think as a result the highatomic feature is effectively dead. Your
results confirm there are no more Highatomic pageblocks and zero Atomic
order-4+ allocations (actually it's weird there's still 1 highatomic
pageblock with zero allocations that would reserve it, or is that a rounding
error due to calculating average across multiple hosts?).

I think it's not a surprise that there are no costly highatomic allocation
attempts, we've always said they are too easy to fail, so likely nobody even
tries them. MIGRATE_HIGHATOMIC was introduced by Mel [1] and evaluated on
order-1. Even the non-costly orders can fail of course and should have
fallbacks, highatomic reserves are just supposed to make the success more
likely as that improves e.g. the networking receive performance, and they do
use non-costly orders.

Did you observe no increase of net receive fallbacks due to this patch?
Would that be an universal outcome? I.e. did highatomic reservations become
obsolete thanks to other improvements to the page allocator since they were
introduced? That would be great as we could remove it completely and
simplify the code, but we don't know that yet.

If there are still benefits, they probably should stay, but that means keep
them working for non-costly orders, and we should fix the observed problems
differently. I can see two directions to try in that order.

- You say there are "high frequency net allocations" so I assume they are
ongoing. We could try modify the fastpath __alloc_frozen_pages_noprof() to
properly evaluate ALLOC_HIGHATOMIC and let them prefer the reserved blocks
in cases that do not end up in __alloc_pages_slowpath(). This should ensure
the reserved blocks are actually being used even if we are above low
watermarks and don't enter the slowpath.

- If that doesn't help and we still have unused highatomic pageblocks,
figure out how that happens - is the highatomic allocation frequency higher
at some point, resulting in their increase, and then it drops and they stay
around? If yes, think about how to make the unreserving more aggressive than
it currently is.

[1]
https://lore.kernel.org/all/1442832762-7247-10-git-send-email-mgorman@xxxxxxxxxxxxxxxxxxx/