Re: [PATCH v2 6/6] mm: hugetlb: Refactor out hugetlb_alloc_folio()

From: Oscar Salvador

Date: Tue May 12 2026 - 09:41:56 EST

On Wed, May 06, 2026 at 08:54:42AM -0700, Ackerley Tng via B4 Relay wrote:
> From: Ackerley Tng <ackerleytng@xxxxxxxxxx>
>
> Refactor out hugetlb_alloc_folio() from alloc_hugetlb_folio(), which
> handles allocation of a folio and memory and HugeTLB charging to cgroups.
>
> This refactoring decouples the HugeTLB page allocation from VMAs,
> specifically:
>
> 1. Reservations (as in resv_map) are stored in the vma
> 2. mpol is stored at vma->vm_policy
> 3. A vma must be used for allocation even if the pages are not meant to be
> used by host process.
>
> Without this coupling, VMAs are no longer a requirement for
> allocation. This opens up the allocation routine for usage without VMAs,
> which will allow guest_memfd to use HugeTLB as a more generic allocator of
> huge pages, since guest_memfd memory may not have any associated VMAs by
> design. In addition, direct allocations from HugeTLB could possibly be
> refactored to avoid the use of a pseudo-VMA.
>
> Also, this decouples HugeTLB page allocation from HugeTLBfs, where the
> subpool is stored at the fs mount. This is also a requirement for
> guest_memfd, where the plan is to have a subpool created per-fd and stored
> on the inode.
>
> No functional change intended.
>
> Signed-off-by: Ackerley Tng <ackerleytng@xxxxxxxxxx>

I yet have to review more thoroughly, but I have a comment below:

> ---
> include/linux/hugetlb.h | 3 +
> mm/hugetlb.c | 179 ++++++++++++++++++++++++++----------------------
> 2 files changed, 100 insertions(+), 82 deletions(-)
>
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 93418625d3c5f..ec205d8580885 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -705,6 +705,9 @@ bool hugetlb_bootmem_page_zones_valid(int nid, struct huge_bootmem_page *m);
> int isolate_or_dissolve_huge_folio(struct folio *folio, struct list_head *list);
> int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn);
> void wait_for_freed_hugetlb_folios(void);
> +struct folio *hugetlb_alloc_folio(struct hstate *h, struct hugepage_subpool *spool,
> + struct mempolicy *mpol, int nid, nodemask_t *nodemask,
> + bool charge_hugetlb_cgroup_rsvd, bool use_global_reservation);
> struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
> unsigned long addr, bool cow_from_owner);
> struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid,
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 4159b3565a9be..a1c5b94e52e0a 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -2821,6 +2821,88 @@ void wait_for_freed_hugetlb_folios(void)
> flush_work(&free_hpage_work);
> }
>
> +struct folio *hugetlb_alloc_folio(struct hstate *h, struct hugepage_subpool *spool,
> + struct mempolicy *mpol, int nid, nodemask_t *nodemask,
> + bool charge_hugetlb_cgroup_rsvd, bool use_global_reservation)

I think I would put that information into a context struct that we can
pass to hugetlb_alloc_folio, otherwise this seems too overloaded, and
maybe we need to add more params in the future to tweak even more the
allocation. E.g:

struct hugetlb_alloc_ctxt {
struct hstate *h;
struct hugepage_subpool *spool;
gfp_t gfp_mask;
...
};

Maybe we can go even further and convert those boleans into action flags.

I have the feeling that as is, it is quite ad-hoc code, and the thing is that if
we want to open hugetlb allocations into the world, we should make it as generic as
possible, foreseeing that we do not have to change the API whenever a
new user pops up.

--
Oscar Salvador
SUSE Labs