Re: [RFC PATCH v1 0/7] Open HugeTLB allocation routine for more generic use

From: Joshua Hahn

Date: Thu Feb 26 2026 - 13:52:41 EST

On Wed, 25 Feb 2026 19:37:04 -0800 Ackerley Tng <ackerleytng@xxxxxxxxxx> wrote:

> Joshua Hahn <joshua.hahnjy@xxxxxxxxx> writes:
>
> > On Wed, 11 Feb 2026 16:37:11 -0800 Ackerley Tng <ackerleytng@xxxxxxxxxx> wrote:
> >
> > Hi Ackerly, I hope you're donig well!
> >
> > [...snip...]
> >
> >> I would like to get feedback on:
> >>
> >> 1. Opening up HugeTLB's allocation for more generic use
> >
> > I'm not entirely familiar with guest_memfd, so pleae excuse my ignorance
> > if I'm missing anything obvious.
>
> Happy to take questions! Thank you for your thoughts and reviews!

Of course, thank you for your work, Ackerley!

> > But I'm wondering what hugeTLB offers
> > that other hugepage solutions cannot offer for guest_memfd, if the
> > goal of this series is to decouple it from hugeTLBfs.
> >
>
> The one other huge page source that we've explored is THP pages from the
> buddy allocator. Compared to HugeTLB, huge pages from the buddy
> allocator
>
> + Has a maximum size of 2M
> + Does not guarantee huge pages the way HugeTLB does - HugeTLB pages are
> allocated at boot, and guest_memfd can reserve pages at guest_memfd
> creation time.
> + Allocation of HugeTLB pages is also really fast, it's just dequeuing
> from a preallocated pool

All of these make sense. Just wanted to know if guest_memfd had any
unique usecases for hugeTLB that normal hugetlbfs didn't have.

> The last reason to use HugeTLB is not because of any inherent advantage
> of using HugeTLB over other sources of huge pages, but for
> administrative/scheduling purposes:
>
> Given that existing non-guest_memfd workloads are already using
> HugeTLB, for optimal scheduling, machine memory is already carved up
> in HugeTLB pages for these workloads. Workloads that require using
> guest_memfd (like Confidential VMs) must also use HugeTLB to
> participate in optimial workload scheduling across machines.
>
> >> 2. Reverting and re-adopting the try-commit-cancel protocol for memory
> >> charging
> >
> > On the second point, I am wondering if reintroducing the try-commit-cancel
> > protocol is tied to factoring out hugetlb_alloc_folio. When I removed
> > the protocol a while back, the justification was that for the most part,
> > grabbing a hugetlb folio was a relatively cheap & fast operation, since
> > hugetlb mostly operates out of a preallocated pool.
> >
> > So the cost of being wrong, going above the limit, and having to return
> > the hugetlb folio was also relatively low.
> >
>
> Thanks for this! I saw your patch to just optimistically grab a HugeTLB
> page :) For that patch, the primary reason was to simplify the logic,
> and the simplification was justifiable because grabbing a folio is
> cheap, right? (And so grabbing a folio being cheap wasn't a reason in
> itself?)

Yes, exactly!

> > It seems like this patch series introduces some new paths for hugetlb
> > pages to be consumed (specifically, without a reservation or vma).
> > I imagine that these new paths make the slowpath for hugetlb more frequent,
> > which makes the cost of assuming that the memcg limit is OK higher?
> > I think explicitly spelling this out in the justification for reintroducing
> > the charging protocol could be helpful.
> >
>
> Yes, I should have done that. Will copy the following to the next
> revision.

Thank you for considering!

> The main reason is that reintroducing the charging protocol is the
> clearest way (for me) to cleanly refactor out hugetlb_alloc_folio()
> without worrying about the edge cases around HugeTLB reservations and
> charging.
>
> If I didn't reintroduce the charging protocol, I would have to depend on
> freeing the new hugetlb folio on memcg charging failure, and the freeing
> in turn depends on the subpool correctly being set in the folio, and the
> presence of the subpool influences (in free_huge_folio()) whether the
> reservation was returned to the global hstate. Aaannnd... there's also a
> hugetlb_restore_reserve flag that controls whether to return the folio
> to the subpool (and the hstate). I find folio_clear_hugetlb_restore_reserve()
> on certain code paths kind of magical/unexplained too.

I see, if it makes the code simpler to introduce the protocol again, I see
no reason why we shouldn't revert the patch : -)

> I would rather iron out those charging and reservation details
> separately from this series (with more testing support).
>
> On the other hand, reintroducing the charging protocol has the benefit
> of avoiding allocations (not just dequeuing, if surplus HugeTLB pages
> are required) if the memcg limit is hit. Also, if the original reason
> for removing the protocol was to simplify the code, refactoring out
> hugetlb_alloc_folio() also simplifies the code, and I think it's
> actually nice that memcg charging is done the same way as the other two
> (h_cg and h_cg_rsvd charging). After hugetlb_alloc_folio() is refactored
> out, the gotos make all three charging systems consistent and symmetric,
> which I think is nice to have :)
>
> I hope the consistent/symmetric charging among all 3 systems is welcome,
> what do you think?

For the hugetlbfs case, the path to allocate a hugeTLB page on demand
makes sense, so I definitely see the argument for avoiding allocations.
Does guest_memfd also have a path to allocate a hugeTLB page outside of
the boottime reservations? In that case I think it would be nice to
clarify that the allocation failure case optimization is also for
guest_memfd, not only for hugetlbfs.

Symmetric charging is definitely welcome : -) All of your reasons make
sense to me, I just wanted to ask and make sure.

Thanks for your thoughts! I hope you have a great day!!
Joshua