Re: [PATCH] mm/hugetlb: use separate nodemask for bootmem allocations

From: Oscar Salvador
Date: Wed Apr 09 2025 - 03:48:00 EST


On Wed, Apr 02, 2025 at 08:56:13PM +0000, Frank van der Linden wrote:
> Hugetlb boot allocation has used online nodes for allocation since
> commit de55996d7188 ("mm/hugetlb: use online nodes for bootmem
> allocation"). This was needed to be able to do the allocations
> earlier in boot, before N_MEMORY was set.
>
> This might lead to a different distribution of gigantic hugepages
> across NUMA nodes if there are memoryless nodes in the system.
>
> What happens is that the memoryless nodes are tried, but then
> the memblock allocation fails and falls back, which usually means
> that the node that has the highest physical address available
> will be used (top-down allocation). While this will end up
> getting the same number of hugetlb pages, they might not be
> be distributed the same way. The fallback for each memoryless
> node might not end up coming from the same node as the
> successful round-robin allocation from N_MEMORY nodes.
>
> While administrators that rely on having a specific number of
> hugepages per node should use the hugepages=N:X syntax, it's
> better not to change the old behavior for the plain hugepages=N
> case.
>
> To do this, construct a nodemask for hugetlb bootmem purposes
> only, containing nodes that have memory. Then use that
> for round-robin bootmem allocations.
>
> This saves some cycles, and the added advantage here is that
> hugetlb_cma can use it too, avoiding the older issue of
> pointless attempts to create a CMA area for memoryless nodes
> (which will also cause the per-node CMA area size to be too
> small).
>
> Fixes: de55996d7188 ("mm/hugetlb: use online nodes for bootmem allocation")
> Signed-off-by: Frank van der Linden <fvdl@xxxxxxxxxx>

This looks good to me

Reviewed-by: Oscar Salvador <osalvador@xxxxxxx>

The only think I was pondering whether it would be a way
to keep hugetlb_bootmem_set_nodes() confined in hugetlb code
and not having to export that to hugetlb_cma.

But then again, you would have to create a function that calls
hugetlb_bootmem_set_nodes() earlier and would be churn for churn.


--
Oscar Salvador
SUSE Labs