Physical memory hotadd has to allocate a memmap (struct page array) for
the newly added memory section. Currently, alloc_pages_node() is used
for those allocations.
This has some disadvantages:
a) an existing memory is consumed for that purpose
(eg: ~2MB per 128MB memory section on x86_64)
b) if the whole node is movable then we have off-node struct pages
which has performance drawbacks.
c) It might be there are no PMD_ALIGNED chunks so memmap array gets
populated with base pages.
This can be improved when CONFIG_SPARSEMEM_VMEMMAP is enabled.
Vmemap page tables can map arbitrary memory.
That means that we can simply use the beginning of each memory section and
map struct pages there.
struct pages which back the allocated space then just need to be treated
carefully.
Implementation wise we will reuse vmem_altmap infrastructure to override
the default allocator used by __populate_section_memmap.
Part of the implementation also relies on memory_block structure gaining
a new field which specifies the number of vmemmap_pages at the beginning.
This patch also introduces the following functions:
- vmemmap_init_space: Initializes vmemmap pages by calling move_pfn_range_to_zone(),
calls kasan_add_zero_shadow() or the vmemmap range and marks
online as many sections as vmemmap pages fully span.
- vmemmap_adjust_pages: Accounts/substract vmemmap_pages to node and zone
present_pages
- vmemmap_deinit_space: Undoes what vmemmap_init_space does.
Signed-off-by: Oscar Salvador <osalvador@xxxxxxx>
---
drivers/base/memory.c | 64 ++++++++++++++--
include/linux/memory.h | 8 +-
include/linux/memory_hotplug.h | 13 ++++
include/linux/memremap.h | 2 +-
include/linux/mmzone.h | 7 +-
mm/Kconfig | 5 ++
mm/memory_hotplug.c | 162 ++++++++++++++++++++++++++++++++++++++++-
mm/sparse.c | 2 -
8 files changed, 247 insertions(+), 16 deletions(-)
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index f209925a5d4e..a5e536a3e9a4 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -173,16 +173,65 @@ static int memory_block_online(struct memory_block *mem)
{
unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
+ unsigned long nr_vmemmap_pages = mem->nr_vmemmap_pages;
+ int ret;
+
+ /*
+ * Although vmemmap pages have a different lifecycle than the pages
+ * they describe (they remain until the memory is unplugged), doing
+ * its initialization and accounting at hot-{online,offline} stage
+ * simplifies things a lot
+ */
+ if (nr_vmemmap_pages) {
+ ret = vmemmap_init_space(start_pfn, nr_vmemmap_pages, mem->nid,
+ mem->online_type);
+ if (ret)
+ return ret;
+ }
- return online_pages(start_pfn, nr_pages, mem->online_type, mem->nid);
+ ret = online_pages(start_pfn + nr_vmemmap_pages,
+ nr_pages - nr_vmemmap_pages, mem->online_type,
+ mem->nid);
+
+ /*
+ * Undo the work if online_pages() fails.
+ */
+ if (ret && nr_vmemmap_pages) {
+ vmemmap_adjust_pages(start_pfn, -nr_vmemmap_pages);
+ vmemmap_deinit_space(start_pfn, nr_vmemmap_pages);
+ }
+
+ return ret;
}