Re: [linux-next-6.10-rc6-20240703] Warning at mm/memblock.c:1447

From: Gowans, James
Date: Mon Jul 08 2024 - 06:01:46 EST


Hi Venkat,

On Mon, 2024-07-08 at 15:09 +0530, Venkat Rao Bagalkote wrote:
> Greetings!!!
>
>
> Observing below warning while booting, when fadump is configured with nocam.
>
>
> [    0.061329] ------------[ cut here ]------------
> [    0.061332] WARNING: CPU: 0 PID: 1 at mm/memblock.c:1447
> memblock_alloc_range_nid+0x24c/0x278
> [    0.061337] Modules linked in:
> [    0.061339] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted
> 6.10.0-rc6-next-20240703-auto #1
> [    0.061341] Hardware name: IBM,9080-HEX POWER10 (architected)
> 0x800200 0xf000006 of:IBM,FW1060.00 (NH1060_016) hv:phyp pSeries
> [    0.061342] NIP: c000000002061610 LR: c000000002061424 CTR:
> 0000000000000000
> [    0.061344] REGS: c000000004d2f780 TRAP: 0700 Not tainted
> (6.10.0-rc6-next-20240703-auto)
> [    0.061345] MSR: 8000000002029033 <SF,VEC,EE,ME,IR,DR,RI,LE>  CR:
> 44000242 XER: 20040010
> [    0.061350] CFAR: c00000000206142c IRQMASK: 0
> [    0.061350] GPR00: c000000002061424 c000000004d2fa20 c0000000015a3d00
> 0000000000000001
> [    0.061350] GPR04: 0000000000000800 00000012c0000000 0000002580000000
> ffffffffffffffff
> [    0.061350] GPR08: 0000000000000000 0000000000000002 c000000002f58c08
> 0000000024000242
> [    0.061350] GPR12: c000000000454408 c000000003010000 c0000000000112ac
> 0000000000000000
> [    0.061350] GPR16: 0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> [    0.061350] GPR20: 0000000000000000 0000000000000000 0000000000000000
> c00000000149d390
> [    0.061350] GPR24: c00000000200466c ffffffffffffffff 0000002580000000
> 00000012c0000000
> [    0.061350] GPR28: 0000000000000800 0000000000000005 0000000000000000
> 0000000000000000
> [    0.061365] NIP [c000000002061610] memblock_alloc_range_nid+0x24c/0x278
> [    0.061368] LR [c000000002061424] memblock_alloc_range_nid+0x60/0x278
> [    0.061370] Call Trace:
> [    0.061371] [c000000004d2fa20] [c000000004d2fa60] 0xc000000004d2fa60
> (unreliable)
> [    0.061373] [c000000004d2fae0] [c00000000206178c]
> memblock_phys_alloc_range+0x60/0xe4
> [    0.061376] [c000000004d2fb60] [c000000002017a60]
> setup_fadump+0x114/0x244
> [    0.061379] [c000000004d2fbe0] [c000000000010e78]
> do_one_initcall+0x60/0x398
> [    0.061381] [c000000004d2fcc0] [c000000002006b5c]
> do_initcalls+0x12c/0x218
> [    0.061383] [c000000004d2fd70] [c000000002006f28]
> kernel_init_freeable+0x238/0x370
> [    0.061386] [c000000004d2fde0] [c0000000000112d8] kernel_init+0x34/0x26c
> [    0.061388] [c000000004d2fe50] [c00000000000df7c]
> ret_from_kernel_user_thread+0x14/0x1c
> [    0.061389] --- interrupt: 0 at 0x0
> [    0.061390] Code: eb81ffe0 ebc1fff0 ebe1fff8 7c0803a6 7d710120
> 7d708120 4e800020 60000000 4afbf219 60000000 3b800080 4bfffe40
> <0fe00000> e8610068 7f26cb78 38a02900
> [    0.061396] ---[ end trace 0000000000000000 ]---

The purpose of that newly introduced warning is to detect incorrect
usage of the memblock allocator. Specifically, to find when a
driver/subsystem tries to do a memblock alloc after memblock has given
all system RAM to the buddy allocator. It has maybe caught such a case
now...

I don't have a powerpc system handy to repro your failure, but looking
at the code, it looks like:
1. fadump_setup_param_area allocs a physical range for
fw_dump.param_area and zeroes that range.
2. fadump_append_bootargs() marks it as reserved

But I believe that by this point the memory has already been handed to
the buddy allocator. So it's possible for that zeroing to be clobbering
someone else's memory, as the fadump code incorrectly assumes that it
has exclusive use of this region.

I may be wildly off, but that was the *intention* of the warning.

Adding PowerPC maintainers here for their opinion on whether fadump is
doing the right thing here or not.

>
>
> cat /proc/cmdline
> BOOT_IMAGE=(ieee1275//vdevice/vfc-client@300000d4/disk@50050768101535e5,msdos3)/boot/vmlinuz-6.10.0-rc6-next-20240703
> root=UUID=2c90ab47-3389-4017-9f06-0c94534fd9cb ro
> crashkernel=2G-4G:384M,4G-16G:512M,16G-64G:1G,64G-128G:2G,128G-:4G
> fadump=nocma
>
>
> Reverting the below commit, issue is not seen.
>
>
> Commit ID: 0fa4ac6722127f4aae2ea9813ba246ce2bec8326
>
>
> Regards,
>
> Venkat.
>