Re: [PATCH 1/3] powerpc/numa: Set numa_node for all possible cpus

From: Michal Hocko
Date: Tue Mar 17 2020 - 10:01:29 EST


On Tue 17-03-20 14:44:45, Vlastimil Babka wrote:
> On 3/16/20 10:06 AM, Michal Hocko wrote:
> > On Thu 12-03-20 17:41:58, Vlastimil Babka wrote:
> > [...]
> >> with nid present in:
> >> N_POSSIBLE - pgdat might not exist, node_to_mem_node() must return some online
> >
> > I would rather have a dummy pgdat for those. Have a look at
> > $ git grep "NODE_DATA.*->" | wc -l
> > 63
> >
> > Who knows how many else we have there. I haven't looked more closely.
> > Besides that what is a real reason to not have pgdat ther and force all
> > users of a $random node from those that the platform considers possible
> > for special casing? Is that a memory overhead? Is that really a thing?
>
> I guess we can ignore memory overhead. I guess there only might be some concern
> that for nodes that are initially offline, we will allocate the pgdat on a
> different node, and after they are online, it will stay on a different node with
> more access latency from local cpus. If we only allocate for online nodes, it
> can always be local? But I guess it doesn't matter that much.

This is not the case even now because of chicke&egg. You need a memory
to allocate from and that memory has to be managed somewhere per node
(pgdat). Keep in mind we do not have the bootmem allocator for the
hotplug. Have a look at hotadd_new_pgdat and when it is called. There
are some attempts to allocate memmap from the hotpluged memory but I am
not sure we can do the whole thing without pgdat in place. If we can
then can come up with some replace the pgdat magic. But still I am not
even sure this is something we really have to optimize for.
--
Michal Hocko
SUSE Labs