Re: [RFC v3] sched/topology: fix kernel crash when a CPU is hotplugged in a memoryless node

From: Peter Zijlstra
Date: Fri Mar 15 2019 - 08:26:11 EST


On Fri, Mar 15, 2019 at 12:12:45PM +0100, Laurent Vivier wrote:

> Another way to avoid the nodes overlapping for the offline nodes at
> startup is to ensure the default values don't define a distance that
> merge all offline nodes into node 0.
>
> A powerpc specific patch can workaround the kernel crash by doing this:
>
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index 87f0dd0..3ba29bb 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -623,6 +623,7 @@ static int __init parse_numa_properties(void)
> struct device_node *memory;
> int default_nid = 0;
> unsigned long i;
> + int nid, dist;
>
> if (numa_enabled == 0) {
> printk(KERN_WARNING "NUMA disabled by user\n");
> @@ -636,6 +637,10 @@ static int __init parse_numa_properties(void)
>
> dbg("NUMA associativity depth for CPU/Memory: %d\n",
> min_common_depth);
>
> + for (nid = 0; nid < MAX_NUMNODES; nid ++)
> + for (dist = 0; dist < MAX_DISTANCE_REF_POINTS; dist++)
> + distance_lookup_table[nid][dist] = nid;
> +
> /*
> * Even though we connect cpus to numa domains later in SMP
> * init, we need to know the node ids now. This is because

What does that actually do? That is, what does it make the distance
table look like before and after you bring up the CPUs?

> Any comment?

Well, I had a few questions here:

20190305115952.GH32477@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

that I've not yet seen answers to.