Re: [6.1.0-rc3-next-20221104] Boot failure - kernel BUG at mm/memblock.c:519
From: Mike Rapoport
Date: Tue Nov 08 2022 - 02:58:34 EST
Hi Yajun,
On Tue, Nov 08, 2022 at 02:27:53AM +0000, Yajun Deng wrote:
> Hi Sachin,
> I didn't have a powerpc architecture machine. I don't know why this happened.
>
> Hi Mike,
> Do you have any suggestions?
You can try reproducing the bug qemu or work with Sachin to debug the
issue.
> I tested in tools/testing/memblock, and it was successful.
Memblock tests provide limited coverage still and they don't deal with all
possible cases.
For now I'm dropping this patch from the memblock tree until the issue is
fixed.
> November 6, 2022 8:07 PM, "Sachin Sant" <sachinp@xxxxxxxxxxxxx> wrote:
>
> > While booting recent linux-next on a IBM Power10 Server LPAR
> > following crash is observed:
> >
> > [ 0.000000] numa: Partition configured for 32 NUMA nodes.
> > [ 0.000000] ------------[ cut here ]------------
> > [ 0.000000] kernel BUG at mm/memblock.c:519!
> > [ 0.000000] Oops: Exception in kernel mode, sig: 5 [#1]
> > [ 0.000000] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
> > [ 0.000000] Modules linked in:
> > [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 6.1.0-rc3-next-20221104 #1
> > [ 0.000000] Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1030.00
> > (NH1030_026) hv:phyp pSeries
> > [ 0.000000] NIP: c0000000004ba240 LR: c0000000004bb240 CTR: c0000000004ba210
> > [ 0.000000] REGS: c000000002a8b7b0 TRAP: 0700 Not tainted (6.1.0-rc3-next-20221104)
> > [ 0.000000] MSR: 8000000000021033 <SF,ME,IR,DR,RI,LE> CR: 24042424 XER: 00000001
> > [ 0.000000] CFAR: c0000000004ba290 IRQMASK: 1
> > [ 0.000000] GPR00: c0000000004bb240 c000000002a8ba50 c00000000136ee00 c0000010f3ac00a8
> > [ 0.000000] GPR04: 0000000000000000 c0000010f3ac0090 00000010f3ac0000 0000000000000d00
> > [ 0.000000] GPR08: 0000000000000001 0000000000000007 0000000000000001 0000000000000081
> > [ 0.000000] GPR12: c0000000004ba210 c000000002e10000 0000000000000000 000000000000000d
> > [ 0.000000] GPR16: 000000000f6be620 000000000f6be8e8 000000000f6be788 000000000f6bed58
> > [ 0.000000] GPR20: 000000000f6f6d58 c0000000029a8de8 00000010f3ad8800 0000000000000080
> > [ 0.000000] GPR24: 00000010f3ad7b00 0000000000000000 0000000000000100 0000000000000d00
> > [ 0.000000] GPR28: 00000010f3ad7b00 c0000000029a8de8 c0000000029a8e00 0000000000000006
> > [ 0.000000] NIP [c0000000004ba240] memblock_merge_regions.isra.12+0x40/0x130
> > [ 0.000000] LR [c0000000004bb240] memblock_add_range+0x190/0x300
> > [ 0.000000] Call Trace:
> > [ 0.000000] [c000000002a8ba50] [0000000000000100] 0x100 (unreliable)
> > [ 0.000000] [c000000002a8ba90] [c0000000004bb240] memblock_add_range+0x190/0x300
> > [ 0.000000] [c000000002a8bb10] [c0000000004bb5e0] memblock_reserve+0x70/0xd0
> > [ 0.000000] [c000000002a8bba0] [c000000002045234] memblock_alloc_range_nid+0x11c/0x1e8
> > [ 0.000000] [c000000002a8bc60] [c0000000020453a4] memblock_alloc_internal+0xa4/0x110
> > [ 0.000000] [c000000002a8bcb0] [c0000000020456cc] memblock_alloc_try_nid+0x94/0xcc
> > [ 0.000000] [c000000002a8bd40] [c00000000200b570] alloc_paca_data+0x7c/0xcc
> > [ 0.000000] [c000000002a8bdb0] [c00000000200b770] allocate_paca+0x8c/0x28c
> > [ 0.000000] [c000000002a8be50] [c00000000200a26c] setup_arch+0x1c4/0x4d8
> > [ 0.000000] [c000000002a8bed0] [c000000002004378] start_kernel+0xb4/0xa84
> > [ 0.000000] [c000000002a8bf90] [c00000000000da90] start_here_common+0x1c/0x20
> > [ 0.000000] Instruction dump:
> > [ 0.000000] 7c0802a6 fba1ffe8 fbc1fff0 fbe1fff8 7c7d1b78 7c9e2378 3be00000 f8010010
> > [ 0.000000] f821ffc1 e9230000 3969ffff 4800000c <0b0a0000> 7d3f4b78 393f0001 7fbf5840
> > [ 0.000000] ---[ end trace 0000000000000000 ]---
> > [ 0.000000]
> > [ 0.000000] Kernel panic - not syncing: Fatal exception
> > [ 0.000000] Rebooting in 180 seconds..
> >
> > This problem was introduced with next-20221101. Git bisect points to
> > following patch
> >
> > commit 3f82c9c4ac377082e1230f5299e0ccce07b15e12
> > Date: Tue Oct 25 15:09:43 2022 +0800
> > memblock: don't run loop in memblock_add_range() twice
> >
> > Reverting this patch helps boot the kernel to login prompt.
> >
> > Have attached .config
> >
> > - Sachin
--
Sincerely yours,
Mike.