Re: [PATCHv6 11/17] mm/hugetlb: Remove fake head pages

From: David Hildenbrand (Arm)

Date: Fri Feb 06 2026 - 04:36:37 EST


On 2/2/26 16:56, Kiryl Shutsemau wrote:
HugeTLB Vmemmap Optimization (HVO) reduces memory usage by freeing most
vmemmap pages for huge pages and remapping the freed range to a single
page containing the struct page metadata.

With the new mask-based compound_info encoding (for power-of-2 struct
page sizes), all tail pages of the same order are now identical
regardless of which compound page they belong to. This means the tail
pages can be truly shared without fake heads.

Allocate a single page of initialized tail struct pages per NUMA node
per order in the vmemmap_tails[] array in pglist_data. All huge pages of
that order on the node share this tail page, mapped read-only into their
vmemmap. The head page remains unique per huge page.

Redefine MAX_FOLIO_ORDER using ilog2(). The define has to produce a
compile-constant as it is used to specify vmemmap_tail array size.
For some reason, compiler is not able to solve get_order() at
compile-time, but ilog2() works.

Avoid PUD_ORDER to define MAX_FOLIO_ORDER as it adds dependency to
<linux/pgtable.h> which generates hard-to-break include loop.

This eliminates fake heads while maintaining the same memory savings,
and simplifies compound_head() by removing fake head detection.

Signed-off-by: Kiryl Shutsemau <kas@xxxxxxxxxx>
---

[...]

#define node_present_pages(nid) (NODE_DATA(nid)->node_present_pages)
diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
index a39a301e08b9..688764c52c72 100644
--- a/mm/hugetlb_vmemmap.c
+++ b/mm/hugetlb_vmemmap.c
@@ -19,6 +19,7 @@
#include <asm/tlbflush.h>
#include "hugetlb_vmemmap.h"
+#include "internal.h"
/**
* struct vmemmap_remap_walk - walk vmemmap page table
@@ -505,6 +506,32 @@ static bool vmemmap_should_optimize_folio(const struct hstate *h, struct folio *
return true;
}
+static struct page *vmemmap_get_tail(unsigned int order, int node)
+{
+ struct page *tail, *p;
+ unsigned int idx;
+
+ idx =

Could do

const unsigned int idx = order - VMEMMAP_TAIL_MIN_ORDER;

above.

+ tail = READ_ONCE(NODE_DATA(node)->vmemmap_tails[idx]);
+ if (tail)

Wondering if a likely() would be a good idea here. I guess we'll usually go through that fast path on a system that has been running for a bit.

+ return tail;
+
+ tail = alloc_pages_node(node, GFP_KERNEL | __GFP_ZERO, 0);
+ if (!tail)
+ return NULL;
+
+ p = page_to_virt(tail);
+ for (int i = 0; i < PAGE_SIZE / sizeof(struct page); i++)
+ prep_compound_tail(p + i, NULL, order);

This leaves all pageflags, refcount etc. set to 0, which is mostly expected for tail pages.

But, I would have expected something a bit more from __init_single_page() that initialized the page properly.

In particular:
* set_page_node(page, node), or how is page_to_nid() handled?
* atomic_set(&page->_mapcount, -1), to not indicate something odd to
core-mm where we would suddenly have a page mapping for a hugetlb
folio.

+
+ if (cmpxchg(&NODE_DATA(node)->vmemmap_tails[idx], NULL, tail)) {
+ __free_page(tail);
+ tail = READ_ONCE(NODE_DATA(node)->vmemmap_tails[idx]);
+ }
+
+ return tail;
+}

[...]

--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -378,16 +378,44 @@ void vmemmap_wrprotect_hvo(unsigned long addr, unsigned long end,
}
}
-/*
- * Populate vmemmap pages HVO-style. The first page contains the head
- * page and needed tail pages, the other ones are mirrors of the first
- * page.
- */
+static __meminit unsigned long vmemmap_get_tail(unsigned int order, int node)
+{
+ struct page *p, *tail;
+ unsigned int idx;
+
+ BUG_ON(order < VMEMMAP_TAIL_MIN_ORDER);
+ BUG_ON(order > MAX_FOLIO_ORDER);
+
+ idx = order - VMEMMAP_TAIL_MIN_ORDER;
+ tail = NODE_DATA(node)->vmemmap_tails[idx];
+ if (tail)
+ return page_to_pfn(tail);
+
+ p = vmemmap_alloc_block_zero(PAGE_SIZE, node);
+ if (!p)
+ return 0;
+
+ for (int i = 0; i < PAGE_SIZE / sizeof(struct page); i++)
+ prep_compound_tail(p + i, NULL, order);
+
+ tail = virt_to_page(p);
+ NODE_DATA(node)->vmemmap_tails[idx] = tail;
+
+ return page_to_pfn(tail);
+}
+
int __meminit vmemmap_populate_hvo(unsigned long addr, unsigned long end,
int node, unsigned long headsize)
{
+ unsigned long maddr, len, tail_pfn;
+ unsigned int order;
pte_t *pte;
- unsigned long maddr;
+
+ len = end - addr;
+ order = ilog2(len * sizeof(struct page) / PAGE_SIZE);


Could initialize them as const above.

But I am wondering whether it shouldn't be the caller that provides this to use? After all, it's all hugetlb code that allocates and prepares that.

Then we could maybe change

#ifdef·CONFIG_SPARSEMEM_VMEMMAP
struct·page·*vmemmap_tails[NR_VMEMMAP_TAILS];
#endif

to be HVO-only.

--
Cheers,

David