Re: [PATCH v2] mm/memory hotplug/unplug: Optimize zone contiguous check when changing pfn range
From: David Hildenbrand (Arm)
Date: Thu Apr 02 2026 - 11:05:47 EST
On 4/1/26 09:01, Yuan Liu wrote:
> When move_pfn_range_to_zone() or remove_pfn_range_from_zone() updates a
> zone, set_zone_contiguous() rescans the entire zone pageblock-by-pageblock
> to rebuild zone->contiguous. For large zones this is a significant cost
> during memory hotplug and hot-unplug.
>
> Add a new zone member pages_with_online_memmap that tracks the number of
> pages within the zone span that have an online memmap (including present
> pages and memory holes whose memmap has been initialized). When
> spanned_pages == pages_with_online_memmap the zone is contiguous and
> pfn_to_page() can be called on any PFN in the zone span without further
> pfn_valid() checks.
>
> Only pages that fall within the current zone span are accounted towards
> pages_with_online_memmap. A "too small" value is safe, it merely prevents
> detecting a contiguous zone.
>
> The following test cases of memory hotplug for a VM [1], tested in the
> environment [2], show that this optimization can significantly reduce the
> memory hotplug time [3].
>
> +----------------+------+---------------+--------------+----------------+
> | | Size | Time (before) | Time (after) | Time Reduction |
> | +------+---------------+--------------+----------------+
> | Plug Memory | 256G | 10s | 3s | 70% |
> | +------+---------------+--------------+----------------+
> | | 512G | 36s | 7s | 81% |
> +----------------+------+---------------+--------------+----------------+
>
> +----------------+------+---------------+--------------+----------------+
> | | Size | Time (before) | Time (after) | Time Reduction |
> | +------+---------------+--------------+----------------+
> | Unplug Memory | 256G | 11s | 4s | 64% |
> | +------+---------------+--------------+----------------+
> | | 512G | 36s | 9s | 75% |
> +----------------+------+---------------+--------------+----------------+
>
> [1] Qemu commands to hotplug 256G/512G memory for a VM:
> object_add memory-backend-ram,id=hotmem0,size=256G/512G,share=on
> device_add virtio-mem-pci,id=vmem1,memdev=hotmem0,bus=port1
> qom-set vmem1 requested-size 256G/512G (Plug Memory)
> qom-set vmem1 requested-size 0G (Unplug Memory)
>
> [2] Hardware : Intel Icelake server
> Guest Kernel : v7.0-rc4
> Qemu : v9.0.0
>
> Launch VM :
> qemu-system-x86_64 -accel kvm -cpu host \
> -drive file=./Centos10_cloud.qcow2,format=qcow2,if=virtio \
> -drive file=./seed.img,format=raw,if=virtio \
> -smp 3,cores=3,threads=1,sockets=1,maxcpus=3 \
> -m 2G,slots=10,maxmem=2052472M \
> -device pcie-root-port,id=port1,bus=pcie.0,slot=1,multifunction=on \
> -device pcie-root-port,id=port2,bus=pcie.0,slot=2 \
> -nographic -machine q35 \
> -nic user,hostfwd=tcp::3000-:22
>
> Guest kernel auto-onlines newly added memory blocks:
> echo online > /sys/devices/system/memory/auto_online_blocks
>
> [3] The time from typing the QEMU commands in [1] to when the output of
> 'grep MemTotal /proc/meminfo' on Guest reflects that all hotplugged
> memory is recognized.
>
> Reported-by: Nanhai Zou <nanhai.zou@xxxxxxxxx>
> Reported-by: Chen Zhang <zhangchen.kidd@xxxxxx>
> Tested-by: Yuan Liu <yuan1.liu@xxxxxxxxx>
> Reviewed-by: Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx>
> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@xxxxxxxxx>
> Reviewed-by: Yu C Chen <yu.c.chen@xxxxxxxxx>
> Reviewed-by: Pan Deng <pan.deng@xxxxxxxxx>
> Reviewed-by: Nanhai Zou <nanhai.zou@xxxxxxxxx>
> Co-developed-by: Tianyou Li <tianyou.li@xxxxxxxxx>
> Signed-off-by: Tianyou Li <tianyou.li@xxxxxxxxx>
> Signed-off-by: Yuan Liu <yuan1.liu@xxxxxxxxx>
> ---
> Documentation/mm/physical_memory.rst | 11 +++++
> drivers/base/memory.c | 6 +++
> include/linux/mmzone.h | 44 +++++++++++++++++++
> mm/internal.h | 8 +---
> mm/memory_hotplug.c | 12 +-----
> mm/mm_init.c | 64 +++++++++++++++++-----------
> 6 files changed, 102 insertions(+), 43 deletions(-)
>
> diff --git a/Documentation/mm/physical_memory.rst b/Documentation/mm/physical_memory.rst
> index b76183545e5b..e47e96ef6a6d 100644
> --- a/Documentation/mm/physical_memory.rst
> +++ b/Documentation/mm/physical_memory.rst
> @@ -483,6 +483,17 @@ General
> ``present_pages`` should use ``get_online_mems()`` to get a stable value. It
> is initialized by ``calculate_node_totalpages()``.
>
> +``pages_with_online_memmap``
> + Tracks pages within the zone that have an online memmap (present pages and
> + memory holes whose memmap has been initialized). When ``spanned_pages`` ==
> + ``pages_with_online_memmap``, ``pfn_to_page()`` can be performed without
> + further checks on any PFN within the zone span.
> +
> + Note: this counter may temporarily undercount when pages with an online
> + memmap exist outside the current zone span. Growing the zone to cover such
Maybe add here "This can only happen during boot, when initializing the
memmap of pages that do not fall into any zone span."
> + * we will not try to shrink the zones.
s/zone/it/ ?
[...]
> +
> +/*
> + * Initialize unavailable range [spfn, epfn) while accounting only the pages
> + * that fall within the zone span towards pages_with_online_memmap. Pages
> + * outside the zone span are still initialized but not accounted.
> + */
> +static void __init init_unavailable_range_for_zone(struct zone *zone,
> + unsigned long spfn,
> + unsigned long epfn)
Best to use double tab to fit this into a single line
unsigned long spfn, unsigned long epfn)
^ two tabs
> +{
> + int nid = zone_to_nid(zone);
> + int zid = zone_idx(zone);
Both can be const.
> + unsigned long in_zone_start;
> + unsigned long in_zone_end;
> +
> + in_zone_start = clamp(spfn, zone->zone_start_pfn, zone_end_pfn(zone));
> + in_zone_end = clamp(epfn, zone->zone_start_pfn, zone_end_pfn(zone));
> +
> + if (spfn < in_zone_start)
> + init_unavailable_range(spfn, in_zone_start, zid, nid);
> +
> + if (in_zone_start < in_zone_end)
> + zone->pages_with_online_memmap +=
> + init_unavailable_range(in_zone_start, in_zone_end,
> + zid, nid);
Best to use a temporary variable to make this easier to read.
pgcnt = init_unavailable_range(in_zone_start, ...
You can also exceed 80c a bit if it aids readability.
> +
> + if (in_zone_end < epfn)
> + init_unavailable_range(in_zone_end, epfn, zid, nid);
> }
Only nits, hoping we don't miss anything obvious (or any corner case :) ).
If Mike tells us that we are processing all pages during boot
appropriately, this should work.
Thanks!
Acked-by: David Hildenbrand (Arm) <david@xxxxxxxxxx>
--
Cheers,
David