Re: revisiting alloc_pages_bulks semantics?

From: Zi Yan

Date: Wed May 27 2026 - 04:34:22 EST


On 27 May 2026, at 16:00, Christoph Hellwig wrote:

> On Wed, May 27, 2026 at 03:53:53PM +0800, Zi Yan wrote:
>>> 1) early fail semantics
>>>
>>> alloc_pages_bulks can do partial allocations for some reasons, and
>>> users usually have a fallback by either looping and calling it again
>>> or falling back to single page allocations. This sucks! Why can't
>>> we get our usual try as hard as you can semantics, requiring
>>> GFP_NORETRY or similar to relax it?
>>
>> IIUC, current alloc_pages_bulks() tries to get free pages without doing
>> compaction or reclaim unless none can be allocated.
>
> Yes, which is really odd, as other page/folio allocators make that an
> opt-in through GFP flags.

Based on my understanding of the code, the GFP flags are respected at
the __alloc_pages_noprof() in alloc_pages_bulk(). The loop of
rmqueue_pcplist() is just a quick try of getting free pages.
And I suspect it might be quicker than calling __alloc_pages_noprof()
in a loop, since other preparation work in __alloc_pages_noprof()
is only done once.

>
>> Does your “usual try”
>> mean possible invocation of compaction and/or reclaim for every page
>> allocation?
>
> If you look at most callers in tree, and my recently merged or to be
> merged work isn't any different, they just bloody want the pages just
> as any other allocator. Failing under grave memory pressure is fine
> of course, but just failing because getting the memory requires effort
> is not.
>
>> I guess it also relates to the order > 0 bulk allocation
>> below? My gut feeling is that if one “usual try” fails, the following
>> “usual try” might not work. So making alloc_pages_bulks() do heavy
>> allocation might not buy you much.
>
> Well, we need to centralize this. Right now there is lots of divering
> cargo culting in the callers.
>
>> But can you elaborate on why looping alloc_pages_bulks() does not work
>> well? That is essentially triggering compaction/reclaim repeatedly
>> like your proposed “usual try” idea.
>
> I'm not even sure if it works well. There are some callers that do that,
> some use individual fallbacks. I don't really want to think about that
> when all I need is a few folios.
>
>>> The bulk allocator is limited to order 0 which limits it's usefulness
>>> these days. It would be really helpful to do bulk allocations for
>>> the pagecache or bounce buffering.
>>
>> Sounds reasonable to me, but when under memory pressure, I wonder
>> how many > order 0 folios you can get in the end. And that might
>> cause a storm of compaction and/or reclaim if combined with Idea 1.
>
> Well, I really want them. In some cases I might be fine falling down
> to smaller sizes, but I also really don't want the logic in every
> caller.

Based on your answers above, it sounds like a wrapper of
__alloc_pages_bulk() that doing allocation in a loop until all requested
pages are filled might be good enough for your case.

But let me know if I miss something.

>
>> For > order 0 bulk allocations, are you thinking about 1)
>> a try and bail-out early model or 2) a keep-trying model?
>
> Both are useful and as with other allocators should depend on the
> passed in GFP flags.

Like I said above, __alloc_pages_noprof() in alloc_pages_bulk()
respects the GFP flags.

>
>> For the latter, I wonder how large the allocation latency can be
>> and if that is tolerable or even makes sense, since for THP
>> allocations, we have seen >30s allocation latency when under
>> memory pressure. Is waiting minutes for bulk > order 0 allocation
>> making sense in your use cases?
>
> The allocations I have in mind would only require try hard allocations
> for typical file system blocks sizes (64k at most), while eveything
> larger is fair game for falling back.

Sure. In MM, PAGE_ALLOC_COSTLY_ORDER is 3, so pages bigger than that
would take more effort to get and the allocation latency can be longer.
So it might take a long time to allocate the last 64KB page in
a bulk allocation.

I do not have any data for such scenarios, but some trick I can think
of is to ask compaction and reclaim to aim for more free pages instead
of just the requested order (not higher order), so that after one round
of compaction and/or reclaim, more pages at the requested order can
be allocated afterwards.


Best Regards,
Yan, Zi