Re: [PATCH 09/14] mm/sparse: remove CONFIG_MEMORY_HOTPLUG-specific usemap allocation handling

From: David Hildenbrand (Arm)

Date: Fri Mar 20 2026 - 14:50:02 EST


On 3/17/26 20:48, Lorenzo Stoakes (Oracle) wrote:
> On Tue, Mar 17, 2026 at 05:56:47PM +0100, David Hildenbrand (Arm) wrote:
>> In 2008, we added through commit 48c906823f39 ("memory hotplug: allocate
>> usemap on the section with pgdat") quite some complexity to try
>> allocating memory for the "usemap" (storing pageblock information
>> per memory section) for a memory section close to the memory of the
>> "pgdat" of the node.
>>
>> The goal was to make memory hotunplug of boot memory more likely to
>> succeed. That commit also added some checks for circular dependencies
>> between two memory sections, whereby two memory sections would contain
>> each others usemap, turning bot memory sections un-removable.
>
> Typo: bot -> both. Presumably you are not talking about memory a bot of some
> kind allocated :P
>
>>
>> However, in 2010, commit a4322e1bad91 ("sparsemem: Put usemap for one node
>> together") started allocating the usemap for multiple memory
>> sections on the same node in one chunk, effectively grouping all usemap
>> allocations of the same node in a single memblock allocation.
>>
>> We don't really give guarantees about memory hotunplug of boot memory, and
>> with the change in 2010, it is pretty much impossible in practice to get
>> any circular dependencies.
>
> Pretty much impossible? :) We can probably go so far as to so impossible no?

Yes.

>
>>
>> commit 48c906823f39 ("memory hotplug: allocate usemap on the section with
>> pgdat") also added the comment:
>>
>> "Similarly, a pgdat can prevent a section being removed. If
>> section A contains a pgdat and section B
>> contains the usemap, both sections become inter-dependent."
>>
>> Given that we don't free the pgdat anymore, that comment (and handling)
>> does not apply.
>
> Isn't pgdat synonymous with a node and that's the data structure that describes
> a node right? Confusingly typedef'd from pglist_data to pg_data_t but then
> referred to as pgdat because all that makes so much sense :)

Yeah, in general we refer to the NODE_DATA as pgdat (grep for it and
you'll be surprised).

>
> But I'm confused, does a section containing a pgdat mean a section having the
> pgdat data structure literally allocated in it?

Yes. "struct pgdat" placed in some memory section.

>
> A usemap is... something that tracks pageblock metadata I think right?

Yes. Essentially a large array of bytes, whereby each byte describes a
pageblock data (migratetype etc)

>
> Anyway I'm also confused by 'given we don't free the pgdat any more', but the
> comment says a 'pgdat can prevent a section being removed' rather than anything
> about it being removed?

Well, if a pgdat resides in some memory section, given that it is
unmovable turns the whole memory section unremovable -> hotunplug fails.

Assuming you could free the pgdat when the node goes offlining, you
would turn that memory section removable.

And I think that commit somehow assumed that the last memory section
could be removed if all it contains is the corresponding pgdat (which
was never the case).

>
> I guess it means the OTHER section could be prevented from being removed even
> after it's gone.. somehow?
>
> Anyway! I think maybe this could be clearer, somehow :)

I'm afraid the whole purpose of the original patch was sketchy, which is
also while I fail to even explain the original motivation clearly.

Now it's fortunately no longer required. :)

>
>>
>> So let's simply remove this complexity.
>>
>> Signed-off-by: David Hildenbrand (Arm) <david@xxxxxxxxxx>
>
> I think what you've done in the patch is right though, we're not doing any of
> these dances after a4322e1bad91 and pgdats sitting around mean we don't really
> care about where the usemap goes anyway I don't think so...
>
> I usemap and I find myself in a place where I give you a:
>
> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@xxxxxxxxxx>
>

Thanks ;)

[...]

>> -
>> #ifdef CONFIG_SPARSEMEM_VMEMMAP
>> unsigned long __init section_map_size(void)
>> {
>> @@ -486,7 +390,6 @@ void __init sparse_init_early_section(int nid, struct page *map,
>> unsigned long pnum, unsigned long flags)
>> {
>> BUG_ON(!sparse_usagebuf || sparse_usagebuf >= sparse_usagebuf_end);
>> - check_usemap_section_nr(nid, sparse_usagebuf);
>> sparse_init_one_section(__nr_to_section(pnum), pnum, map,
>> sparse_usagebuf, SECTION_IS_EARLY | flags);
>> sparse_usagebuf = (void *)sparse_usagebuf + mem_section_usage_size();
>> @@ -497,8 +400,7 @@ static int __init sparse_usage_init(int nid, unsigned long map_count)
>> unsigned long size;
>>
>> size = mem_section_usage_size() * map_count;
>> - sparse_usagebuf = sparse_early_usemaps_alloc_pgdat_section(
>> - NODE_DATA(nid), size);
>> + sparse_usagebuf = memblock_alloc_node(size, SMP_CACHE_BYTES, nid);
>
> I guess nid here is the same node as the pgdat?

Yes! before we used NODE_DATA(nid)->node_id, which is really just ... nid :)

--
Cheers,

David