Re: [PATCH v7 1/2] mm/memory hotplug: fix zone->contiguous always false when hotplug

From: Li, Tianyou
Date: Thu Jan 08 2026 - 02:35:47 EST


Very appreciated for your review David! The suggestions you made are clear and the code/comments you posted are well formatted, I can even copy/pasted without modification. Thanks.


Regards,

Tianyou

On 1/7/2026 4:03 AM, David Hildenbrand (Red Hat) wrote:
On 12/22/25 15:58, Tianyou Li wrote:
Function set_zone_contiguous used __pageblock_pfn_to_page to
check the whole pageblock is in the same zone. One assumption is
the memory section must online, otherwise the __pageblock_pfn_to_page
will return NULL, then the set_zone_contiguous will be false.
When move_pfn_range_to_zone invoked set_zone_contiguous, since the
memory section did not online, the return value will always be false.

To fix this issue, we removed the set_zone_contiguous from the
move_pfn_range_to_zone, and place it after memory section onlined.

Function remove_pfn_range_from_zone did not have this issue because
memory section remains online at the time set_zone_contiguous invoked.

The description is a bit hard to follow. Let me try:


"set_zone_contiguous() uses __pageblock_pfn_to_page() to detect pageblocks that either do not exist (hole) or that do not belong to the same zone.

__pageblock_pfn_to_page(), however, relies on pfn_to_online_page(), effectively always returning NULL for memory ranges that were not onlined yet. So when called on a range-to-be-onlined, it indicates a memory hole to set_zone_contiguous().

Consequently, the set_zone_contiguous() call in move_pfn_range_to_zone(), which happens early during memory onlining, will never detect a zone as being contiguous. Bad.

To fix the issue, move the set_zone_contiguous() call to a later stage
in memory onlining, where pfn_to_online_page() will succeed: after we
mark the memory sections to be online"

Will change accordingly.  Thanks.



Now, there is no need to add the handling to mhp_init_memmap_on_memory(). Note how mhp_init_memmap_on_memory() in memory_block_online() is always followed by online_pages().

Plus there is no dependencies of previous zone contiguous state for the set_zone_contiguous now, it totally makes sense to remove the set_zone_contiguous in mhp_init_memmap_on_memory() as you suggested.


So, it's sufficient to move it after the online_pages_range(). I would also add a comment there saying something like:

/*
 * Now that the ranges are indicated as online, check whether the whole
 * zone is contiguous.
 */

Will change accordingly. Thanks.



Can we find some Fixes: tag (which commit introduced the regression)? Likely we want to CC stable.

Yes, probably we can add the tags as below, where the pfn_to_page() changed to pfn_to_online_page() in __pageblock_pfn_to_page().

Fixes: 2d070eab2e82 (mm: consider zone which is not fully populated to have holes)
Cc: Michal Hocko <mhocko@xxxxxxxx>