Re: [PATCH v9 2/2] mm/memory hotplug/unplug: Optimize zone->contiguous update when changes pfn range

From: Li, Tianyou

Date: Tue Feb 10 2026 - 10:28:41 EST

On 2/10/2026 7:44 PM, Mike Rapoport wrote:

On Mon, Feb 09, 2026 at 01:44:45PM +0100, David Hildenbrand (Arm) wrote:

On 2/9/26 11:52, David Hildenbrand (Arm) wrote:

On 2/8/26 20:39, Mike Rapoport wrote:

On Sat, Feb 07, 2026 at 12:00:09PM +0100, David Hildenbrand (Arm) wrote:

Thanks for all your work on this and sorry for being slower with
review the last month.

While I was in the shower I was thinking about how much I hate
zone->contiguous + the pageblock walking, and how we could just get
rid of it.

You know, just what you do while having a relaxing shower.

And I was wondering:

(a) in which case would we have zone_spanned_pages == zone_present_pages
and the zone *not* being contiguous? I assume this just cannot happen,
otherwise BUG.

(b) in which case would we have zone_spanned_pages != zone_present_pages
and the zone *being* contiguous? I assume in some cases where we
have small
holes within a pageblock?

Reading the doc of __pageblock_pfn_to_page(), there are some weird
scenarios with holes in pageblocks.

It seems that "zone->contigous" is really bad name for what this thing
represents.

tl;dr I don't think zone_spanned_pages == zone_present_pages is
related to
zone->contigous at all :)

My point in (a) was that with "zone_spanned_pages == zone_present_pages"
there are no holes so -> contiguous.

(b), and what I said further below, is exactly about memory holes where
we have a memmap, but it's not present memory.

If you look at pageblock_pfn_to_page() and __pageblock_pfn_to_page(), the
check for zone->contigous should guarantee that the entire pageblock
has a
valid memory map and that the entire pageblock fits a zone and does not
cross zone/node boundaries.

Right. But that must hold for each and ever pageblock in the spanned
zone range for it to be contiguous.

zone->contigous tells you "pfn_to_page()" is valid on the complete zone
range"

That's why set_zone_contiguous() probes __pageblock_pfn_to_page() on ech
and ever pageblock.

For coldplug memory the memory map is valid for every section that has
present memory, i.e. even it there is a hole in a section, it's
memory map
will be populated and will have struct pages.

There is this sub-section thing, and holes larger than a section might
not have a memmap (unless reserved I guess).

When zone->contigous is false, the slow path in __pageblock_pfn_to_page()
essentially checks if the first page in a pageblock is online and if
first
and last pages are in the zone being compacted.
AFAIU, in the hotplug case an entire pageblock is always onlined to the
same zone, so zone->contigous won't change after the hotplug is complete.

I think you are missing a point: hotp(un)plug might create holes in the
zone span. Then, pfn_to_page() is no longer valid to be called on
arbitrary pageblocks within the zone.

We might set it to false in the beginning of the hotplug to avoid
scanning
offline pages, although I'm not sure if it's possible.

But in the end of hotplug we can simply restore the old value and
move on.

No, you might create holes.

For the coldplug case I'm also not sure it's worth the hassle, we could
just let compaction scan a few more pfns for those rare weird pageblocks
and bail out on wrong page conditions.

To recap:

My idea is that "zone_spanned_pages == zone_present_pages" tells you
that the zone is contiguous because there are no holes.

To handle "non-memory with a struct page", you'd have to check

    "zone_spanned_pages == zone_present_pages +
         zone_non_present_memmap_pages"

Or shorter

    "zone_spanned_pages == zone_pages_with_memmap"

Then, pfn_to_page() is valid within the complete zone.

The question is how to best calculate the "zone_pages_with_memmap"
during boot.

During hot(un)plug we only add/remove zone_present_pages. The
zone_non_present_memmap_pages will not change due to hot(un)plug later.

The following hack does the trick. But

(a) I wish we could get rid of the pageblock walking in calc_online_pages().
(b) "online_pages" has weird semantics due to the pageblock handling.
"online_pageblock_pages"? not sure.
(c) Calculating "online_pages" when we know there is a hole does not make sense,
as we could just keep it 0 if there are holes and simply set it to
zone->online_pageblock_pages->zone->spanned_pages in case all are online.

From d4cb825e91a6363afc68fb994c5d9b29c38c5f42 Mon Sep 17 00:00:00 2001
From: "David Hildenbrand (Arm)" <david@xxxxxxxxxx>
Date: Mon, 9 Feb 2026 13:40:24 +0100
Subject: [PATCH] tmp

Signed-off-by: David Hildenbrand (Arm) <david@xxxxxxxxxx>
---
include/linux/mmzone.h | 25 +++++++++++++++++++++++--
mm/internal.h | 8 +-------
mm/memory_hotplug.c | 20 ++++++--------------
mm/mm_init.c | 12 ++++++------
4 files changed, 36 insertions(+), 29 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index fc5d6c88d2f0..3f7d8d88c597 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -943,6 +943,11 @@ struct zone {
* cma pages is present pages that are assigned for CMA use
* (MIGRATE_CMA).
*
+ * online_pages is pages within the zone that have an online memmap.
+ * online_pages include present pages and memory holes that have a
+ * memmap. When spanned_pages == online_pages, pfn_to_page() can be
+ * performed without further checks on any pfn within the zone span.

Maybe pages_with_memmap? It would stand off from managed, spanned and
present, but it's clearer than online IMHO.

+ *
* So present_pages may be used by memory hotplug or memory power
* management logic to figure out unmanaged pages by checking
* (present_pages - managed_pages). And managed_pages should be used
@@ -967,6 +972,7 @@ struct zone {
atomic_long_t managed_pages;
unsigned long spanned_pages;
unsigned long present_pages;
+ unsigned long online_pages;
#if defined(CONFIG_MEMORY_HOTPLUG)
unsigned long present_early_pages;
#endif
@@ -1051,8 +1057,6 @@ struct zone {
bool compact_blockskip_flush;
#endif
- bool contiguous;
-
CACHELINE_PADDING(_pad3_);
/* Zone statistics */
atomic_long_t vm_stat[NR_VM_ZONE_STAT_ITEMS];
@@ -1124,6 +1128,23 @@ static inline bool zone_spans_pfn(const struct zone *zone, unsigned long pfn)
return zone->zone_start_pfn <= pfn && pfn < zone_end_pfn(zone);
}
+/**
+ * zone_is_contiguous - test whether a zone is contiguous
+ * @zone: the zone to test.
+ *
+ * In a contiguous zone, it is valid to call pfn_to_page() on any pfn in the
+ * spanned zone without requiting pfn_valid() or pfn_to_online_page() checks.
+ *
+ * Returns: true if contiguous, otherwise false.
+ */
+static inline bool zone_is_contiguous(const struct zone *zone)
+{
+ return READ_ONCE(zone->spanned_pages) == READ_ONCE(zone->online_pages);
+}
+
static inline bool zone_is_initialized(const struct zone *zone)
{
return zone->initialized;
diff --git a/mm/internal.h b/mm/internal.h
index f35dbcf99a86..6062f9b8ee62 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -716,21 +716,15 @@ extern struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
static inline struct page *pageblock_pfn_to_page(unsigned long start_pfn,
unsigned long end_pfn, struct zone *zone)
{
- if (zone->contiguous)
+ if (zone_is_contiguous(zone))
return pfn_to_page(start_pfn);
return __pageblock_pfn_to_page(start_pfn, end_pfn, zone);
}
-void set_zone_contiguous(struct zone *zone);
bool pfn_range_intersects_zones(int nid, unsigned long start_pfn,
unsigned long nr_pages);
-static inline void clear_zone_contiguous(struct zone *zone)
-{
- zone->contiguous = false;
-}
-
extern int __isolate_free_page(struct page *page, unsigned int order);
extern void __putback_isolated_page(struct page *page, unsigned int order,
int mt);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index a63ec679d861..76496c1039a9 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -492,11 +492,11 @@ static void shrink_zone_span(struct zone *zone, unsigned long start_pfn,
pfn = find_smallest_section_pfn(nid, zone, end_pfn,
zone_end_pfn(zone));
if (pfn) {
- zone->spanned_pages = zone_end_pfn(zone) - pfn;
+ WRITE_ONCE(zone->spanned_pages, zone_end_pfn(zone) - pfn);
zone->zone_start_pfn = pfn;
} else {
zone->zone_start_pfn = 0;
- zone->spanned_pages = 0;
+ WRITE_ONCE(zone->spanned_pages, 0);
}
} else if (zone_end_pfn(zone) == end_pfn) {
/*
@@ -508,10 +508,10 @@ static void shrink_zone_span(struct zone *zone, unsigned long start_pfn,
pfn = find_biggest_section_pfn(nid, zone, zone->zone_start_pfn,
start_pfn);
if (pfn)
- zone->spanned_pages = pfn - zone->zone_start_pfn + 1;
+ WRITE_ONCE(zone->spanned_pages, pfn - zone->zone_start_pfn + 1);
else {
zone->zone_start_pfn = 0;
- zone->spanned_pages = 0;
+ WRITE_ONCE(zone->spanned_pages, 0);
}
}
}
@@ -565,18 +565,13 @@ void remove_pfn_range_from_zone(struct zone *zone,
/*
* Zone shrinking code cannot properly deal with ZONE_DEVICE. So
- * we will not try to shrink the zones - which is okay as
- * set_zone_contiguous() cannot deal with ZONE_DEVICE either way.
+ * we will not try to shrink the zones.
*/
if (zone_is_zone_device(zone))
return;
- clear_zone_contiguous(zone);
-
shrink_zone_span(zone, start_pfn, start_pfn + nr_pages);
update_pgdat_span(pgdat);
-
- set_zone_contiguous(zone);
}
/**
@@ -753,8 +748,6 @@ void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
struct pglist_data *pgdat = zone->zone_pgdat;
int nid = pgdat->node_id;
- clear_zone_contiguous(zone);
-
if (zone_is_empty(zone))
init_currently_empty_zone(zone, start_pfn, nr_pages);
resize_zone_range(zone, start_pfn, nr_pages);
@@ -782,8 +775,6 @@ void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
memmap_init_range(nr_pages, nid, zone_idx(zone), start_pfn, 0,
MEMINIT_HOTPLUG, altmap, migratetype,
isolate_pageblock);
-
- set_zone_contiguous(zone);
}
struct auto_movable_stats {
@@ -1079,6 +1070,7 @@ void adjust_present_page_count(struct page *page, struct memory_group *group,
if (early_section(__pfn_to_section(page_to_pfn(page))))
zone->present_early_pages += nr_pages;
zone->present_pages += nr_pages;
+ WRITE_ONCE(zone->online_pages, zone->online_pages + nr_pages);
zone->zone_pgdat->node_present_pages += nr_pages;
if (group && movable)
diff --git a/mm/mm_init.c b/mm/mm_init.c
index 2a809cd8e7fa..e33caa6fb6fc 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -2263,9 +2263,10 @@ void __init init_cma_pageblock(struct page *page)
}
#endif
-void set_zone_contiguous(struct zone *zone)
+static void calc_online_pages(struct zone *zone)
{
unsigned long block_start_pfn = zone->zone_start_pfn;
+ unsigned long online_pages = 0;
unsigned long block_end_pfn;
block_end_pfn = pageblock_end_pfn(block_start_pfn);
@@ -2277,12 +2278,11 @@ void set_zone_contiguous(struct zone *zone)
if (!__pageblock_pfn_to_page(block_start_pfn,
block_end_pfn, zone))
- return;
+ continue;
cond_resched();
+ online_pages += block_end_pfn - block_start_pfn;

I think we can completely get rid of this with something like this untested
patch to calculate zone->online_pages for coldplug:

diff --git a/mm/mm_init.c b/mm/mm_init.c
index e33caa6fb6fc..ff2f75e7b49f 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -845,9 +845,9 @@ overlap_memmap_init(unsigned long zone, unsigned long *pfn)
* zone/node above the hole except for the trailing pages in the last
* section that will be appended to the zone/node below.
*/
-static void __init init_unavailable_range(unsigned long spfn,
- unsigned long epfn,
- int zone, int node)
+static u64 __init init_unavailable_range(unsigned long spfn,
+ unsigned long epfn,
+ int zone, int node)
{
unsigned long pfn;
u64 pgcnt = 0;
@@ -861,6 +861,8 @@ static void __init init_unavailable_range(unsigned long spfn,
if (pgcnt)
pr_info("On node %d, zone %s: %lld pages in unavailable ranges\n",
node, zone_names[zone], pgcnt);
+
+ return pgcnt;
}
/*
@@ -959,9 +961,10 @@ static void __init memmap_init_zone_range(struct zone *zone,
memmap_init_range(end_pfn - start_pfn, nid, zone_id, start_pfn,
zone_end_pfn, MEMINIT_EARLY, NULL, MIGRATE_MOVABLE,
false);
+ zone->online_pages += (end_pfn - start_pfn);
if (*hole_pfn < start_pfn)
- init_unavailable_range(*hole_pfn, start_pfn, zone_id, nid);
+ zone->online_pages += init_unavailable_range(*hole_pfn, start_pfn, zone_id, nid);
*hole_pfn = end_pfn;
}

Sorry for late response, I am trying to catch up with the discussion:) Per my understanding, zone->contiguous has 2 semantics in combination, one is the pages are full filled in the zone span, the other is those pages could be access as it has been onlined. The check zone_spanned_pages == zone_online_pages guaranteed the both. Either resize_zone_span() or shrink_zone_span() will change the zone_spanned_pages so they need to use WRITE_ONCE to guarantee the ordering; so does to the adjust_present_page_count() where zone_online_pages get updated.

Regards,

Tianyou