Re: [PATCH v2] arm64: Add memory hotplug support

From: Jonathan Cameron
Date: Thu Dec 13 2018 - 09:25:58 EST


On Wed, 12 Dec 2018 11:49:23 +0000
Robin Murphy <robin.murphy@xxxxxxx> wrote:

> On 12/12/2018 11:42, Jonathan Cameron wrote:
> > On Tue, 11 Dec 2018 18:48:48 +0000
> > Robin Murphy <robin.murphy@xxxxxxx> wrote:
> >
> >> Wire up the basic support for hot-adding memory. Since memory hotplug
> >> is fairly tightly coupled to sparsemem, we tweak pfn_valid() to also
> >> cross-check the presence of a section in the manner of the generic
> >> implementation, before falling back to memblock to check for no-map
> >> regions within a present section as before. By having arch_add_memory(()
> >> create the linear mapping first, this then makes everything work in the
> >> way that __add_section() expects.
> >>
> >> We expect hotplug to be ACPI-driven, so the swapper_pg_dir updates
> >> should be safe from races by virtue of the global device hotplug lock.
> >>
> >> Signed-off-by: Robin Murphy <robin.murphy@xxxxxxx>
> > Hi Robin,
> >
> > What tree is this against?
> >
> > rodata_full doesn't seem be exist for me on 4.20-rc6.
>
> Sorry, this is now based on the arm64 for-next/core branch - I was
> similarly confused when Will first mentioned rodata_full on v1 ;)
>
> > With v1 I did the 'new node' test and it looked good except for an
> > old cgroups warning that has always been there (and has been on my list
> > to track down for a long time).
>
> Great, thanks for testing!
>
Hi Robin,

For physical memory hotplug (well sort of as I'm not really pulling
modules in and out of the machine, test is purely on the software).

Tested-by: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx>

There is still an issue with a warning from the cpuset cgroups
controller I reported a while back but haven't followed up on.
That has nothing to do with this set though.

Tested with adding memory to proximity nodes that already have
memory in them and nodes that don't. NUMA node support is just the
x86 code ripped out to a common location and appropriate SRAT.

We are looking at the virtualization usecases but that will take
a while longer.

If we can sneak this in this cycle that would be great!

Thanks,

Jonathan

> Robin.
>
> >
> > Jonathan
> >> ---
> >>
> >> v2: Handle page-mappings-only cases appropriately
> >>
> >> arch/arm64/Kconfig | 3 +++
> >> arch/arm64/mm/init.c | 8 ++++++++
> >> arch/arm64/mm/mmu.c | 17 +++++++++++++++++
> >> arch/arm64/mm/numa.c | 10 ++++++++++
> >> 4 files changed, 38 insertions(+)
> >>
> >> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> >> index 4dbef530cf58..be423fda5cec 100644
> >> --- a/arch/arm64/Kconfig
> >> +++ b/arch/arm64/Kconfig
> >> @@ -261,6 +261,9 @@ config ZONE_DMA32
> >> config HAVE_GENERIC_GUP
> >> def_bool y
> >>
> >> +config ARCH_ENABLE_MEMORY_HOTPLUG
> >> + def_bool y
> >> +
> >> config SMP
> >> def_bool y
> >>
> >> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> >> index 6cde00554e9b..4bfe0fc9edac 100644
> >> --- a/arch/arm64/mm/init.c
> >> +++ b/arch/arm64/mm/init.c
> >> @@ -291,6 +291,14 @@ int pfn_valid(unsigned long pfn)
> >>
> >> if ((addr >> PAGE_SHIFT) != pfn)
> >> return 0;
> >> +
> >> +#ifdef CONFIG_SPARSEMEM
> >> + if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
> >> + return 0;
> >> +
> >> + if (!valid_section(__nr_to_section(pfn_to_section_nr(pfn))))
> >> + return 0;
> >> +#endif
> >> return memblock_is_map_memory(addr);
> >> }
> >> EXPORT_SYMBOL(pfn_valid);
> >> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> >> index 674c409a8ce4..da513a1facf4 100644
> >> --- a/arch/arm64/mm/mmu.c
> >> +++ b/arch/arm64/mm/mmu.c
> >> @@ -1046,3 +1046,20 @@ int pud_free_pmd_page(pud_t *pudp, unsigned long addr)
> >> pmd_free(NULL, table);
> >> return 1;
> >> }
> >> +
> >> +#ifdef CONFIG_MEMORY_HOTPLUG
> >> +int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
> >> + bool want_memblock)
> >> +{
> >> + int flags = 0;
> >> +
> >> + if (rodata_full || debug_pagealloc_enabled())
> >> + flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
> >> +
> >> + __create_pgd_mapping(swapper_pg_dir, start, __phys_to_virt(start),
> >> + size, PAGE_KERNEL, pgd_pgtable_alloc, flags);
> >> +
> >> + return __add_pages(nid, start >> PAGE_SHIFT, size >> PAGE_SHIFT,
> >> + altmap, want_memblock);
> >> +}
> >> +#endif
> >> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
> >> index 27a31efd9e8e..ae34e3a1cef1 100644
> >> --- a/arch/arm64/mm/numa.c
> >> +++ b/arch/arm64/mm/numa.c
> >> @@ -466,3 +466,13 @@ void __init arm64_numa_init(void)
> >>
> >> numa_init(dummy_numa_init);
> >> }
> >> +
> >> +/*
> >> + * We hope that we will be hotplugging memory on nodes we already know about,
> >> + * such that acpi_get_node() succeeds and we never fall back to this...
> >> + */
> >> +int memory_add_physaddr_to_nid(u64 addr)
> >> +{
> >> + pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", addr);
> >> + return 0;
> >> +}
> >
> >