Re: [RFC PATCH v1 0/7] Open HugeTLB allocation routine for more generic use

From: Ackerley Tng

Date: Mon Mar 09 2026 - 03:01:56 EST

Joshua Hahn <joshua.hahnjy@xxxxxxxxx> writes:

> On Wed, 25 Feb 2026 19:37:04 -0800 Ackerley Tng <ackerleytng@xxxxxxxxxx> wrote:
>
>> Joshua Hahn <joshua.hahnjy@xxxxxxxxx> writes:
>>
>> > On Wed, 11 Feb 2026 16:37:11 -0800 Ackerley Tng <ackerleytng@xxxxxxxxxx> wrote:
>> >
>> > Hi Ackerly, I hope you're donig well!
>> >
>> > [...snip...]
>> >
>> >> I would like to get feedback on:
>> >>
>> >> 1. Opening up HugeTLB's allocation for more generic use
>> >
>> > I'm not entirely familiar with guest_memfd, so pleae excuse my ignorance
>> > if I'm missing anything obvious.
>>
>> Happy to take questions! Thank you for your thoughts and reviews!
>
> Of course, thank you for your work, Ackerley!
>
>> > But I'm wondering what hugeTLB offers
>> > that other hugepage solutions cannot offer for guest_memfd, if the
>> > goal of this series is to decouple it from hugeTLBfs.
>> >
>>
>> The one other huge page source that we've explored is THP pages from the
>> buddy allocator. Compared to HugeTLB, huge pages from the buddy
>> allocator
>>
>> + Has a maximum size of 2M
>> + Does not guarantee huge pages the way HugeTLB does - HugeTLB pages are
>> allocated at boot, and guest_memfd can reserve pages at guest_memfd
>> creation time.
>> + Allocation of HugeTLB pages is also really fast, it's just dequeuing
>> from a preallocated pool
>
> All of these make sense. Just wanted to know if guest_memfd had any
> unique usecases for hugeTLB that normal hugetlbfs didn't have.
>

IIUC HugeTLB was meant to make huge pages available to userspace for
performance reasons, guest_memfd wants HugeTLB for the same reason, but
just for virtualization use cases. So nope, I don't think there's any
specifically unique usecases.

These are the differences I can think of between guest_memfd and
HugeTLBfs's usage of HugeTLB:

+ guest_memfd may split HugeTLB pages to individual struct pages during
guest_memfd's ownership of the HugeTLB page. (The pages will be merged
before returning them to HugeTLB)

+ guest_memfd will provide an option to remove memory in guest_memfd
ownership from the kernel direct map - I think HugeTLB pages are
always in the direct map (?)

+ guest_memfd doesn't want to use HugeTLB surplus pages, for now

+ guest_memfd will reserve pages at fd creation time instead of at mmap
time. Reservation is done by creating a subpool, so guest_memfd
doesn't use resv_map.

>> The last reason to use HugeTLB is not because of any inherent advantage
>> of using HugeTLB over other sources of huge pages, but for
>> administrative/scheduling purposes:
>>
>> Given that existing non-guest_memfd workloads are already using
>> HugeTLB, for optimal scheduling, machine memory is already carved up
>> in HugeTLB pages for these workloads. Workloads that require using
>> guest_memfd (like Confidential VMs) must also use HugeTLB to
>> participate in optimial workload scheduling across machines.
>>
>>
>> [...snip...]
>>
>> On the other hand, reintroducing the charging protocol has the benefit
>> of avoiding allocations (not just dequeuing, if surplus HugeTLB pages
>> are required) if the memcg limit is hit. Also, if the original reason
>> for removing the protocol was to simplify the code, refactoring out
>> hugetlb_alloc_folio() also simplifies the code, and I think it's
>> actually nice that memcg charging is done the same way as the other two
>> (h_cg and h_cg_rsvd charging). After hugetlb_alloc_folio() is refactored
>> out, the gotos make all three charging systems consistent and symmetric,
>> which I think is nice to have :)
>>
>> I hope the consistent/symmetric charging among all 3 systems is welcome,
>> what do you think?
>
> For the hugetlbfs case, the path to allocate a hugeTLB page on demand
> makes sense, so I definitely see the argument for avoiding allocations.
> Does guest_memfd also have a path to allocate a hugeTLB page outside of
> the boottime reservations? In that case I think it would be nice to
> clarify that the allocation failure case optimization is also for
> guest_memfd, not only for hugetlbfs.
>

For now, guest_memfd actually doesn't want to use surplus pages, so
guest_memfd won't be allocating pages outside of boottime
reservations.

> Symmetric charging is definitely welcome : -) All of your reasons make
> sense to me, I just wanted to ask and make sure.
>

This change is mostly for (an alternate form of) simplicity :)

> Thanks for your thoughts! I hope you have a great day!!
> Joshua