[PATCH v1 00/12] Make MAX_ORDER adjustable as a kernel boot time parameter.
From: Zi Yan
Date: Wed Sep 21 2022 - 21:13:18 EST
From: Zi Yan <ziy@xxxxxxxxxx>
Hi all,
This patchset adds support for kernel boot time adjustable MAX_ORDER, so that
user can change the largest size of pages buddy allocator allocates.
It is on top of mm-everything-2022-09-19-00-45.
Changelog
===
>From RFCv2
1. Dropped RFC, collected reviewed-by.
2. Added back page validation check in find_buddy_page_pfn() since it is
needed when zone is not contiguous.
3. Converted MAX_ORDER sized static array used in recently added kmsan code to
a dynamic one.
Motivation
===
This enables kernel to allocate 1GB pages and is necessary for my ongoing work
on adding support for 1GB PUD THP[1]. This is also the conclusion I came up with
after some discussion with David Hildenbrand on what methods should be used for
allocating gigantic pages[2], since other approaches like using CMA allocator or
alloc_contig_pages() are regarded as suboptimal.
In addition, make MAX_ORDER a kernel boot time parameter can enable user to
adjust buddy allocator without recompiling the kernel for their own needs, so
that one can still have a small MAX_ORDER if he/she does not need to allocate
gigantic pages like 1GB PUD THPs.
Background
===
At the moment, kernel imposes MAX_ORDER - 1 + PAGE_SHFIT < SECTION_SIZE_BITS
restriction. This prevents buddy allocator merging pages across memory sections,
as PFNs might not be contiguous and code like page++ would fail. But this would
not be an issue when SPARSEMEM_VMEMMAP is set, since all struct page are
virtually contiguous. So boot time adjustable MAX_ORDER depends on
SPARSEMEM_VMEMMAP.
Description
===
I tested the patchset on both x86_64 and ARM64 at 4KB base pages. The systems
boot and run.
In terms of the concerns on performance degradation if MAX_ORDER is increased,
I run vm-scalability from lkp comparing current system, my patchset with
MAX_ORDER=11 and my patchset with MAX_ORDER=20 on a x86_64 VM and saw
almost no performance difference, please vm-scalability reports in the
RFCv2: https://lore.kernel.org/linux-mm/20220811231643.1012912-1-zi.yan@xxxxxxxx/
Patch 1 changes MAX_ORDER to represent the max order of pages allocated
by buddy allocator. right now MAX_ORDER - 1 represents that and it is
confusing. Suggested by Vlastimil Babka. checkpatch.pl is updated to
warn future use of MAX_ORDER, since its semantics is changed.
Patch 2 adds a page validation in find_buddy_page_pfn() when zone is not
contiguous, since some pages in the middle of a zone can be invalid.
Patch 3 make deferred struct page initialization work when MAX_ORDER is
bigger than a memory section size.
Patch 4-7 convert the use of MAX_ORDER to pageblock_order. Since
pageblock_order is a constant when MAX_ORDER can be changed at boot time
and close to current MAX_ORDER value. I separate changes to different patches
for easy review and can merge them into a single one if that works better.
Patch 8 replaces MAX_ORDER with MAX_PHYS_CONTIG_ORDER when it is used to
indicate the maximum number of physically contiguous pages.
Patch 9 adds a new Kconfig option SET_MAX_ORDER to allow specifying MAX_ORDER
when ARCH_FORCE_MAX_ORDER is not used by the arch, like x86_64.
Patch 10 converts statically allocated arrays with MAX_ORDER length to dynamic
ones if possible and prepares for making MAX_ORDER a boot time parameter.
Patch 11 adds a new MIN_MAX_ORDER constant to replace soon-to-be-dynamic
MAX_ORDER for places where converting static array to dynamic one is causing
hassle and not necessary, i.e., ARM64 hypervisor page allocation and SLAB.
Patch 12 changes MAX_ORDER to be a kernel boot time parameter and it is
opt-in as an mm/Kconfig option.
Any suggestion and/or comment is welcome. Thanks.
[1] https://lore.kernel.org/linux-mm/20200928175428.4110504-1-zi.yan@xxxxxxxx/
[2] https://lore.kernel.org/linux-mm/e132fdd9-65af-1cad-8a6e-71844ebfe6a2@xxxxxxxxxx/
Zi Yan (12):
mm: rectify MAX_ORDER semantics to be the largest page order from
buddy allocator
mm: check page validity when find a buddy page in a non-contiguous
zone
mm: adapt deferred struct page init to new MAX_ORDER.
mm: prevent pageblock size being larger than section size.
fs: proc: use pageblock_nr_pages for reschedule period in read_kcore()
virtio: virtio_balloon: use pageblock_order instead of MAX_ORDER
mm/page_reporting: set page_reporting_order to -1 to prevent it
running
mm: replace MAX_ORDER when it is used to indicate max physical
contiguity.
mm: Make MAX_ORDER of buddy allocator configurable via Kconfig
SET_MAX_ORDER.
mm: convert MAX_ORDER sized static arrays to dynamic ones.
mm: introduce MIN_MAX_ORDER to replace MAX_ORDER as compile time
constant.
mm: make MAX_ORDER a kernel boot time parameter.
.../admin-guide/kdump/vmcoreinfo.rst | 4 +-
.../admin-guide/kernel-parameters.txt | 9 +-
arch/Kconfig | 4 +
arch/arc/Kconfig | 4 +-
arch/arm/Kconfig | 12 +-
arch/arm/configs/imx_v6_v7_defconfig | 2 +-
arch/arm/configs/milbeaut_m10v_defconfig | 2 +-
arch/arm/configs/oxnas_v6_defconfig | 2 +-
arch/arm/configs/pxa_defconfig | 2 +-
arch/arm/configs/sama7_defconfig | 2 +-
arch/arm/configs/sp7021_defconfig | 2 +-
arch/arm64/Kconfig | 16 +--
arch/arm64/include/asm/sparsemem.h | 2 +-
arch/arm64/kvm/hyp/include/nvhe/gfp.h | 2 +-
arch/arm64/kvm/hyp/nvhe/page_alloc.c | 2 +-
arch/csky/Kconfig | 2 +-
arch/ia64/Kconfig | 8 +-
arch/ia64/include/asm/sparsemem.h | 4 +-
arch/ia64/mm/hugetlbpage.c | 2 +-
arch/loongarch/Kconfig | 16 +--
arch/m68k/Kconfig.cpu | 8 +-
arch/mips/Kconfig | 22 ++-
arch/nios2/Kconfig | 10 +-
arch/powerpc/Kconfig | 30 ++---
arch/powerpc/configs/85xx/ge_imp3a_defconfig | 2 +-
arch/powerpc/configs/fsl-emb-nonhw.config | 2 +-
arch/powerpc/mm/book3s64/iommu_api.c | 2 +-
arch/powerpc/mm/hugetlbpage.c | 2 +-
arch/powerpc/platforms/powernv/pci-ioda.c | 2 +-
arch/sh/configs/ecovec24_defconfig | 2 +-
arch/sh/mm/Kconfig | 20 ++-
arch/sparc/Kconfig | 8 +-
arch/sparc/kernel/pci_sun4v.c | 2 +-
arch/sparc/kernel/traps_64.c | 2 +-
arch/sparc/mm/tsb.c | 4 +-
arch/um/kernel/um_arch.c | 4 +-
arch/xtensa/Kconfig | 8 +-
drivers/base/regmap/regmap-debugfs.c | 8 +-
drivers/crypto/hisilicon/sgl.c | 6 +-
.../gpu/drm/i915/gem/selftests/huge_pages.c | 2 +-
drivers/gpu/drm/ttm/ttm_device.c | 7 +-
drivers/gpu/drm/ttm/ttm_pool.c | 72 ++++++++--
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 2 +-
drivers/irqchip/irq-gic-v3-its.c | 4 +-
drivers/md/dm-bufio.c | 2 +-
drivers/misc/genwqe/card_utils.c | 2 +-
.../net/ethernet/hisilicon/hns3/hns3_enet.c | 2 +-
drivers/net/ethernet/ibm/ibmvnic.h | 2 +-
drivers/video/fbdev/hyperv_fb.c | 6 +-
drivers/virtio/virtio_balloon.c | 2 +-
drivers/virtio/virtio_mem.c | 8 +-
fs/proc/kcore.c | 2 +-
fs/ramfs/file-nommu.c | 2 +-
include/drm/ttm/ttm_pool.h | 4 +-
include/linux/hugetlb.h | 2 +-
include/linux/mmzone.h | 36 ++++-
include/linux/pageblock-flags.h | 21 ++-
include/linux/slab.h | 8 +-
kernel/crash_core.c | 2 +-
kernel/dma/pool.c | 8 +-
kernel/events/ring_buffer.c | 2 +-
mm/Kconfig | 33 ++++-
mm/compaction.c | 8 +-
mm/debug_vm_pgtable.c | 4 +-
mm/huge_memory.c | 2 +-
mm/hugetlb.c | 4 +-
mm/internal.h | 8 +-
mm/kmsan/init.c | 18 ++-
mm/memblock.c | 8 +-
mm/memory.c | 4 +-
mm/memory_hotplug.c | 6 +-
mm/page_alloc.c | 127 +++++++++++++-----
mm/page_isolation.c | 14 +-
mm/page_owner.c | 6 +-
mm/page_reporting.c | 8 +-
mm/shuffle.h | 2 +-
mm/slab.c | 2 +-
mm/slub.c | 6 +-
mm/vmscan.c | 1 -
mm/vmstat.c | 14 +-
net/smc/smc_ib.c | 2 +-
scripts/checkpatch.pl | 8 ++
security/integrity/ima/ima_crypto.c | 2 +-
tools/testing/memblock/linux/mmzone.h | 6 +-
84 files changed, 462 insertions(+), 272 deletions(-)
--
2.35.1