[PATCH v2 00/18] kho: make boot time huge page allocation work nicely with KHO

From: Pratyush Yadav

Date: Fri Jun 05 2026 - 14:51:39 EST


From: "Pratyush Yadav (Google)" <pratyush@xxxxxxxxxx>

Hi,

Gigantic huge page allocation is somewhat broken currently with KHO.

First, they break scratch size accounting. Since they are allocated
using the memblock alloc APIs, they count towards RSRV_KERN, and this
scratch size when using scratch_scale. This means if huge pages take a
large enough chunk of system memory scratch size will blow up and fail
to allocate.

Second, scratch can not contain preserved memory, and if hugepages are
allocated from scratch, they will fail to be preserved with the upcoming
hugetlb preservation series [0].

Fix this by introducing the concept of extended scratch areas. They are
areas that the kernel discovers on boot by walking the radix tree and
finding free memory ranges. See patch 10 for more details.

Discovering the scratch areas needs some preparatory changes to KHO, the
radix tree APIs, and to memblock. Patches 1-14 do that.

Patch 15 adds the scratch discovery logic.

Patch 16 adds the dedicated memblock hugetlb allocator.

Patch 17-18 fix the scratch size calculation with using scratch_scale.

[0] https://lore.kernel.org/linux-mm/20251206230222.853493-1-pratyush@xxxxxxxxxx/T/#u

Changes in v2:

Detailed changelog below.

At a high level, the major change in this version is to remove
MEMBLOCK_KHO_SCRATCH_EXT. Keep MEMBLOCK_KHO_SCRATCH as the only memory
type and mark the discovered areas with it. For HugeTLB, add a dedicated
allocation routine and if allocated memory lands in scratch, do a retry.
Also introduce MEMBLOCK_RSRV_HUGETLB to help with accounting of scratch
area sizes.

- Fixup commit message in patch 1 to make namespacing change clearer.
- Use @key in kernel-doc for radix functions.
- Add a runtime check on key width.
- Move all mem retrieval logic to kho_mem_retrieve().
- Add a comment in kho_mem_retrieve() explaining why mem_map won't be NULL.
- Rename callbacks to ->leaf() and ->node().
- Fixup commit messages.
- Clear tree->root in kho_radix_destroy_tree(). This lets the tree be
re-initialized by calling kho_radix_init_tree()
- Add kho_get_mem_map() earlier in the series.
- Export kho_scratch_overlap() and use it in memblock_is_kho_scratch_memory().
- Get rid of MEMBLOCK_KHO_SCRATCH_EXT.
- Introduce MEMBLOCK_RSRV_HUGETLB.
- Introduce memblock_alloc_hugetlb() for hugetlb bootmem allocations.
- Refactor memblock_reserved_kern_size() to allow calculating size by flags.
- Exclude hugetlb memory from scratch size calculation.
- Collect R-bys.

Regards,
Pratyush Yadav

Pratyush Yadav (Google) (18):
kho: generalize radix tree APIs
kho: disallow wide keys in radix tree
kho: return virtual address of mem_map
kho: store incoming radix tree in kho_in
kho: move all memory retrieval logic to kho_mem_retrieve()
kho: add a struct for radix callbacks
kho: add callback for table pages
kho: add data argument to radix walk callback
kho: allow early-boot usage of the KHO radix tree
kho: allow destroying KHO radix tree
kho: add kho_radix_init_tree()
kho: export kho_scratch_overlap()
kho: initialize kho_scratch pointer earlier in boot
memblock: use kho_scratch_overlap() to decide migratetype
kho: extend scratch
memblock: make HugeTLB bootmem allocation work with KHO
memblock: allow calculating reserved size by flags
kho: exclude hugetlb memory from scratch size calculation

include/linux/kexec_handover.h | 10 +
include/linux/kho/abi/kexec_handover.h | 8 +
include/linux/kho_radix_tree.h | 44 +-
include/linux/memblock.h | 9 +-
kernel/liveupdate/Makefile | 1 -
kernel/liveupdate/kexec_handover.c | 495 +++++++++++++++-----
kernel/liveupdate/kexec_handover_debug.c | 25 -
kernel/liveupdate/kexec_handover_internal.h | 9 -
mm/hugetlb.c | 22 +-
mm/memblock.c | 120 ++++-
mm/mm_init.c | 1 +
11 files changed, 540 insertions(+), 204 deletions(-)
delete mode 100644 kernel/liveupdate/kexec_handover_debug.c


base-commit: 2935777b418d2bfcbfe96705bb2c0fa6c0d94e18
--
2.54.0.1032.g2f8565e1d1-goog