Re: aarch64 ACPI boot regressed by commit 7ba5f605f3a0 ("arm64/numa: remove the limitation that cpu0 must bind to node0")

From: Andrew Jones
Date: Fri Oct 14 2016 - 09:44:35 EST


On Fri, Oct 14, 2016 at 03:18:00PM +0200, Laszlo Ersek wrote:
> On 10/14/16 10:05, Andrew Jones wrote:
> > On Fri, Oct 14, 2016 at 12:50:29AM +0200, Laszlo Ersek wrote:
> >> (4) Analysis (well, a lame attempt at that, because I have zero
> >> familiarity with this code). Let me quote the patch:
> >>
> >>> commit 7ba5f605f3a0d9495aad539eeb8346d726dfc183
> >>> Author: Zhen Lei <thunder.leizhen@xxxxxxxxxx>
> >>> Date: Thu Sep 1 14:55:04 2016 +0800
> >>>
> >>> arm64/numa: remove the limitation that cpu0 must bind to node0
> >>>
> >>> 1. Remove the old binding code.
> >>> 2. Read the nid of cpu0 from dts.
> >>> 3. Fallback the nid of cpu0 to 0 when numa=off is set in bootargs.
> >>>
> >>> Signed-off-by: Zhen Lei <thunder.leizhen@xxxxxxxxxx>
> >>> Signed-off-by: Will Deacon <will.deacon@xxxxxxx>
> >>>
> >>> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> >>> index c3c08368a685..8b048e6ec34a 100644
> >>> --- a/arch/arm64/kernel/smp.c
> >>> +++ b/arch/arm64/kernel/smp.c
> >>> @@ -624,6 +624,7 @@ static void __init of_parse_and_init_cpus(void)
> >>> }
> >>>
> >>> bootcpu_valid = true;
> >>> + early_map_cpu_to_node(0, of_node_to_nid(dn));
> >>>
> >>> /*
> >>> * cpu_logical_map has already been
> >>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
> >>> index 0a15f010b64a..778a985c8a70 100644
> >>> --- a/arch/arm64/mm/numa.c
> >>> +++ b/arch/arm64/mm/numa.c
> >>> @@ -116,16 +116,24 @@ static void __init setup_node_to_cpumask_map(void)
> >>> */
> >>> void numa_store_cpu_info(unsigned int cpu)
> >>> {
> >>> - map_cpu_to_node(cpu, numa_off ? 0 : cpu_to_node_map[cpu]);
> >>> + map_cpu_to_node(cpu, cpu_to_node_map[cpu]);
> >>> }
> >>>
> >>> void __init early_map_cpu_to_node(unsigned int cpu, int nid)
> >>> {
> >>> /* fallback to node 0 */
> >>> - if (nid < 0 || nid >= MAX_NUMNODES)
> >>> + if (nid < 0 || nid >= MAX_NUMNODES || numa_off)
> >>> nid = 0;
> >
> > The ACPI equivalent code must be missing (at least) the above, because,
> > even with DT, mach-virt won't have cpu to node mappings unless numa
> > is configured on the command line. Can you try adding something like
> >
> > -m 512 -smp 4 \
> > -numa node,mem=256M,cpus=0-1,nodeid=0 \
> > -numa node,mem=256M,cpus=2-3,nodeid=1
> >
> > to your QEMU command line?
>
> I added the following to my domain XML, under <cpu>:
>
> <numa>
> <cell id='0' cpus='0-1' memory='2097152' unit='KiB'/>
> <cell id='1' cpus='2-3' memory='2097152' unit='KiB'/>
> </numa>
>
> (See <http://libvirt.org/formatdomain.html#elementsCPU>.)
>
> With that, each NUMA node gets half of the VCPUs and half of the guest RAM.
>
> (This is in a different guest now, one that has a bleeding edge Fedora kernel -- I didn't want to rebuild the upstream kernel yet again, just for this test. So, "4.9.0-0.rc0.git7.1.fc26.aarch64" is based on upstream v4.8-14109-g1573d2c, and it reproduces the problem too.)
>
> > Then when you boot with ACPI you'll get a
> > SRAT.
>
> Yes, that's confirmed by the guest kernel log (see below).
>
> > If that works, then we're just missing the "no SRAT, nid = 0"
> > code (that should have been added with this patch)
>
> It still crashes with the SRAT, with the following log:

Rats.

> Note the warning message (from wq_numa_init()):
>
> workqueue: NUMA node mapping not available for cpu0, disabling NUMA support
>
> Something looks genuinely broken with the cpu <-> numa-node associations in the ACPI case -- it even seems to fail when the SRAT does exist.
>
> So, perhaps, commit 7ba5f605f3a0 may not have introduced the bug, only exposed one in the ACPI code?...

The kernel's ACPI NUMA support used to work. It was the test case for
QEMU's SRAT generation code.

Two more experiments I wouldn't mind having you try, if you have time;
1) confirm that this NUMA configured guest and this latest kernel works
with DT. This is just for sanity, but I guess it will.
2) If (1) succeeds, then try the last-known-good kernel with ACPI again,
but this time with the NUMA configured guest.

If (2) fails then we may need to expand the bisection :-(

Thanks,
drew