Re: [next] [arm64] kernel BUG at arch/arm64/mm/physaddr.c

From: Mark Rutland
Date: Tue Jun 15 2021 - 09:19:11 EST


On Tue, Jun 15, 2021 at 01:47:45PM +0100, Mark Rutland wrote:
> On Tue, Jun 15, 2021 at 04:41:25PM +0530, Naresh Kamboju wrote:
> > Following kernel crash reported while boot linux next 20210615 tag on qemu_arm64
> > with allmodconfig build.
> >
> > [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd034]
> > [ 0.000000] Linux version 5.13.0-rc6-next-20210615
> > (tuxmake@ac7978cddede) (aarch64-linux-gnu-gcc (Debian 11.1.0-1)
> > 11.1.0, GNU ld (GNU Binutils for Debian) 2.36.50.20210601) #1 SMP
> > PREEMPT Tue Jun 15 10:20:51 UTC 2021
> > [ 0.000000] Machine model: linux,dummy-virt
> > [ 0.000000] earlycon: pl11 at MMIO 0x0000000009000000 (options '')
> > [ 0.000000] printk: bootconsole [pl11] enabled
> > [ 0.000000] efi: UEFI not found.
> > [ 0.000000] NUMA: No NUMA configuration found
> > [ 0.000000] NUMA: Faking a node at [mem
> > 0x0000000040000000-0x00000000bfffffff]
> > [ 0.000000] NUMA: NODE_DATA [mem 0xbfc00d40-0xbfc03fff]
> > [ 0.000000] ------------[ cut here ]------------
> > [ 0.000000] kernel BUG at arch/arm64/mm/physaddr.c:27!
> > [ 0.000000] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
> > [ 0.000000] Modules linked in:
> > [ 0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G T
> > 5.13.0-rc6-next-20210615 #1 c150a8161d8ff395c5ae7ee0c3c8f22c3689fae4
> > [ 0.000000] Hardware name: linux,dummy-virt (DT)
> > [ 0.000000] pstate: 404000c5 (nZcv daIF +PAN -UAO -TCO BTYPE=--)
> > [ 0.000000] pc : __phys_addr_symbol+0x44/0xc0
> > [ 0.000000] lr : __phys_addr_symbol+0x44/0xc0
> > [ 0.000000] sp : ffff800014287b00
> > [ 0.000000] x29: ffff800014287b00 x28: fc49a9b89db36f0a x27: ffffffffffffffff
> > [ 0.000000] x26: 0000000000000280 x25: 0000000000000010 x24: ffff8000145a8000
> > [ 0.000000] x23: 0000000008000000 x22: 0000000000000010 x21: 0000000000000000
> > [ 0.000000] x20: ffff800010000000 x19: ffff00007fc00d40 x18: 0000000000000000
> > [ 0.000000] x17: 00000000003ee000 x16: 00000000bfc12000 x15: 0000001000000000
> > [ 0.000000] x14: 000000000000de8c x13: 0000001000000000 x12: 00000000f1f1f1f1
> > [ 0.000000] x11: dfff800000000000 x10: ffff700002850eea x9 : 0000000000000000
> > [ 0.000000] x8 : ffff00007fbe0d40 x7 : 0000000000000000 x6 : 000000000000003f
> > [ 0.000000] x5 : 0000000000000040 x4 : 0000000000000005 x3 : ffff8000142bb0c0
> > [ 0.000000] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
> > [ 0.000000] Call trace:
> > [ 0.000000] __phys_addr_symbol+0x44/0xc0
> > [ 0.000000] sparse_init_nid+0x98/0x6d0
>
> From the looks of it, this is pgdat_to_phys, as introduced in next
> commit:
>
> e1db6ef7336d817c ("mm/sparse: fix check_usemap_section_nr warnings")
>
> It appears thta allmodconfig doesn't have CONFIG_NEED_MULTIPLE_NODES=y,
> but does have CONFIG_NUMA=y, and so *does* use the dynamically-allocated
> node_data array (since contig_page_data is only defined for !NUMA).
>
> I don't think that commit is correct.

Looking some more, it looks like that's correct in isolation, but it
clashes with commit:

5831eedad2ac6f38 ("mm: replace CONFIG_NEED_MULTIPLE_NODES with CONFIG_NUMA")

... and I reckon it'd be clearer and more robust to define
pgdat_to_phys() in the same ifdefs as contig_page_data so that
these, stay in-sync. e.g. have:

| #ifdef CONFIG_NUMA
| #define pgdat_to_phys(x) virt_to_phys(x)
| #else /* CONFIG_NUMA */
|
| extern struct pglist_data contig_page_data;
| ...
| #define pgdat_to_phys(x) __pa_symbol(&contig_page_data)
|
| #endif /* CONIFIG_NUMA */

... which'd also make clear that contig_page_data is the *only* expected
pglist_data.

Thanks,
Mark.

> Thanks,
> Mark.
>
> > [ 0.000000] sparse_init+0x460/0x4d4
> > [ 0.000000] bootmem_init+0x110/0x340
> > [ 0.000000] setup_arch+0x1b8/0x2e0
> > [ 0.000000] start_kernel+0x110/0x870
> > [ 0.000000] __primary_switched+0xa8/0xb0
> > [ 0.000000] Code: 940ccf23 eb13029f 54000069 940cce60 (d4210000)
> > [ 0.000000] random: get_random_bytes called from
> > oops_exit+0x54/0xc0 with crng_init=0
> > [ 0.000000] ---[ end trace 0000000000000000 ]---
> > [ 0.000000] Kernel panic - not syncing: Oops - BUG: Fatal exception
> > [ 0.000000] ---[ end Kernel panic - not syncing: Oops - BUG: Fatal
> > exception ]---
> >
> > Reported-by: Naresh Kamboju <naresh.kamboju@xxxxxxxxxx>
> >
> > --
> > Linaro LKFT
> > https://lkft.linaro.org