Re: [PATCH RFC 1/2] mm/memory_hotplug: no need to init new pgdat with node_start_pfn

From: Michal Hocko
Date: Wed Apr 22 2020 - 04:21:09 EST


On Tue 21-04-20 15:06:20, David Hildenbrand wrote:
> On 21.04.20 14:52, Michal Hocko wrote:
> > On Tue 21-04-20 14:35:12, David Hildenbrand wrote:
> >> On 21.04.20 14:30, Michal Hocko wrote:
> >>> Sorry for the late reply
> >>>
> >>> On Thu 16-04-20 12:47:06, David Hildenbrand wrote:
> >>>> A hotadded node/pgdat will span no pages at all, until memory is moved to
> >>>> the zone/node via move_pfn_range_to_zone() -> resize_pgdat_range - e.g.,
> >>>> when onlining memory blocks. We don't have to initialize the
> >>>> node_start_pfn to the memory we are adding.
> >>>
> >>> You are right that the node is empty at this phase but that is already
> >>> reflected by zero present pages (hmm, I do not see spanned pages to be
> >>> set 0 though). What I am missing here is why this is an improvement. The
> >>> new node is already visible here and I do not see why we hide the
> >>> information we already know.
> >>
> >> "information we already know" - no, not before we online the memory.
> >
> > Is this really the case? All add_memory_resource users operate on a
> > physical memory range.
>
> Having the first add_memory() to magically set node_start_pfn of a hotplugged
> node isn't dangerous, I think we agree on that. It's just completely
> unnecessary here and at least left me confused why this is needed at all-
> because the node start/end pfn is only really touched when
> onlining/offlining memory (when resizing the zone and the pgdat).

I do not see any specific problem. It just feels odd to
ignore the start pfn when we have that information. I am little bit
worried that this might kick back. E.g. say we start using the memmaps
from the hotplugged memory then the initial part of the node will never
get online and we would have memmaps outside of the node span. I do not
see an immediate problem except for the feeling this is odd.

That being said I will shut up now and leave it alone.

[...]
> > Btw. one thing that I have in my notes, I was never able to actually
> > test the no numa node case. Because I have always been testing with node
> > being allocated during the boot. Do you have any way to trigger this
> > path?
>
> Sure, here is my test case
>
> #! /bin/bash
> sudo qemu-system-x86_64 \
> --enable-kvm \
> -m 4G,maxmem=20G,slots=2 \
> -smp sockets=2,cores=2 \
> -numa node,nodeid=0,cpus=0-1,mem=4G -numa node,nodeid=1,mem=0G \

I have been using a similar command line
NUMA_ONE_MEMORY_LESS_NODE="-numa node,mem=2G -numa node,mem=0G"
which gets appended to the qemu cmdline. I have always thought that this
would allocate pgdat for node 1 though. I have checked that again now
and dmesg doesn't point to node 1 anywhere.

Thanks!

--
Michal Hocko
SUSE Labs