Re: [PATCH V5 3/4] x86: Support Generic Initiator only proximity domains

From: Jonathan Cameron
Date: Tue Oct 08 2019 - 07:18:00 EST


On Mon, 7 Oct 2019 16:55:05 +0200
Ingo Molnar <mingo@xxxxxxxxxx> wrote:

> * Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx> wrote:
>
> > Done in a somewhat different fashion to arm64.
> > Here the infrastructure for memoryless domains was already
> > in place. That infrastruture applies just as well to
> > domains that also don't have a CPU, hence it works for
> > Generic Initiator Domains.
> >
> > In common with memoryless domains we only register GI domains
> > if the proximity node is not online. If a domain is already
> > a memory containing domain, or a memoryless domain there is
> > nothing to do just because it also contains a Generic Initiator.
> >
> > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx>
> > ---
> > arch/x86/include/asm/numa.h | 2 ++
> > arch/x86/kernel/setup.c | 1 +
> > arch/x86/mm/numa.c | 14 ++++++++++++++
> > 3 files changed, 17 insertions(+)
> >
> > diff --git a/arch/x86/include/asm/numa.h b/arch/x86/include/asm/numa.h
> > index bbfde3d2662f..f631467272a3 100644
> > --- a/arch/x86/include/asm/numa.h
> > +++ b/arch/x86/include/asm/numa.h
> > @@ -62,12 +62,14 @@ extern void numa_clear_node(int cpu);
> > extern void __init init_cpu_to_node(void);
> > extern void numa_add_cpu(int cpu);
> > extern void numa_remove_cpu(int cpu);
> > +extern void init_gi_nodes(void);
> > #else /* CONFIG_NUMA */
> > static inline void numa_set_node(int cpu, int node) { }
> > static inline void numa_clear_node(int cpu) { }
> > static inline void init_cpu_to_node(void) { }
> > static inline void numa_add_cpu(int cpu) { }
> > static inline void numa_remove_cpu(int cpu) { }
> > +static inline void init_gi_nodes(void) { }
> > #endif /* CONFIG_NUMA */
> >
> > #ifdef CONFIG_DEBUG_PER_CPU_MAPS
> > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> > index cfb533d42371..b6c977907ea5 100644
> > --- a/arch/x86/kernel/setup.c
> > +++ b/arch/x86/kernel/setup.c
> > @@ -1264,6 +1264,7 @@ void __init setup_arch(char **cmdline_p)
> > prefill_possible_map();
> >
> > init_cpu_to_node();
> > + init_gi_nodes();
> >
> > io_apic_init_mappings();
> >
> > diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
> > index 4123100e0eaf..50bf724a425e 100644
> > --- a/arch/x86/mm/numa.c
> > +++ b/arch/x86/mm/numa.c
> > @@ -733,6 +733,20 @@ static void __init init_memory_less_node(int nid)
> > */
> > }
> >
> > +/*
> > + * Generic Initiator Nodes may have neither CPU nor Memory.
> > + * At this stage if either of the others were present we would
> > + * already be online.
> > + */
> > +void __init init_gi_nodes(void)
> > +{
> > + int nid;
> > +
> > + for_each_node_state(nid, N_GENERIC_INITIATOR)
> > + if (!node_online(nid))
> > + init_memory_less_node(nid);
> > +}
>
> Nit: missing curly braces.

Good point.

>
> How do these work in practice, will a system that only had nodes 0-1
> today grow a third node '2' that won't have any CPUs on memory on them?

Yes. Exactly that. The result is that fallback lists etc work when
_PXM is used to assign a device into that new node. The interesting
bit comes when a driver does something more interesting and queries
the numa distances from SLIT. At that point the driver can elect to
do load balancing across multiple nodes at similar distances.

In theory you can also specify a device you wish to put into the node
via the SRAT entry (IIRC using segment + BDF for PCI devices), but
for now I haven't implemented that method.

>
> Thanks,
>
> Ingo

Thanks,

Jonathan