Hi Conor,
On 8/16/22 22:41, Conor.Dooley@xxxxxxxxxxxxx wrote:
Hey all,
We've run into a bit of a problem with reserved memory on PolarFire, or
more accurately a pair of problems that seem to have opposite fixes.
The first of these problems is triggered when trying to implement a
remoteproc driver. To get the reserved memory buffer, remoteproc
does an of_reserved_mem_lookup(), something like:
np = of_parse_phandle(pdev->of_node, "memory-region", 0);
if (!np)
return -EINVAL;
rmem = of_reserved_mem_lookup(np);
if (!rmem)
return -EINVAL;
of_reserved_mem_lookup() then uses reserved_mem[i].name to try and find
a match - but this was triggering kernel panics for us. We did some
debugging and found that the name string's pointer was pointing to an
address in the 0x4000_0000 range. The minimum reproduction for this
0x4000_0000 corresponds to DTB_EARLY_BASE_VA: this is the address that is used to map the dtb before we can access it using the linear mapping.
crash is attached - it hacks in some print_reserved_mem()s into
setup_vm_final() around a tlb flush so you can see the before/after.
(You'll need a reserved memory node in your dts to replicate)
The output is like so, with the same crash as in the remoteproc driver:
[ 0.000000] Linux version 6.0.0-rc1-00001-g0d9d6953d834 (conor@wendy) (riscv64-unknown-linux-gnu-gcc (g5964b5cd727) 11.1.0, GNU ld (GNU Binutils) 2.37) #1 SMP Tue Aug 16 13:42:09 IST 2022
[ 0.000000] OF: fdt: Ignoring memory range 0x80000000 - 0x80200000
[ 0.000000] Machine model: Microchip PolarFire-SoC Icicle Kit
[ 0.000000] earlycon: ns16550a0 at MMIO32 0x0000000020100000 (options '115200n8')
[ 0.000000] printk: bootconsole [ns16550a0] enabled
[ 0.000000] printk: debug: skip boot console de-registration.
[ 0.000000] efi: UEFI not found.
[ 0.000000] before flush
[ 0.000000] OF: reserved mem: debug name is fabricbuf@ae000000
[ 0.000000] after flush
[ 0.000000] Unable to handle kernel paging request at virtual address 00000000401c31ac
You take the trap here because the mapping for the dtb does not exist in swapper_pg_dir, but you don't need this mapping anymore as you can access the device tree through the linear mapping now.
I would say that: you build your kernel with CONFIG_BUILTIN_DTB and then you don't call early_init_dt_verify which resets initial_boot_params to the linear mapping address (it was initially set to 0x4000_0000 in parse_dtb). If that's the case, does the following fix your issue?
diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
index 376d2827e736..2b09f0bd8432 100644
--- a/arch/riscv/kernel/setup.c
+++ b/arch/riscv/kernel/setup.c
@@ -276,6 +276,7 @@ void __init setup_arch(char **cmdline_p)
efi_init();
paging_init();
#if IS_ENABLED(CONFIG_BUILTIN_DTB)
+ initial_boot_params = __va(XIP_FIXUP(dtb_early_pa));
unflatten_and_copy_device_tree();
#else
if (early_init_dt_verify(__va(XIP_FIXUP(dtb_early_pa))))
[ 0.000000] Oops [#1]
[ 0.000000] Modules linked in:
[ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 6.0.0-rc1-00001-g0d9d6953d834 #1
[ 0.000000] Hardware name: Microchip PolarFire-SoC Icicle Kit (DT)
[ 0.000000] epc : string+0x4a/0xea
[ 0.000000] ra : vsnprintf+0x1e4/0x336
[ 0.000000] epc : ffffffff80335ea0 ra : ffffffff80338936 sp : ffffffff81203be0
[ 0.000000] gp : ffffffff812e0a98 tp : ffffffff8120de40 t0 : 0000000000000000
[ 0.000000] t1 : ffffffff81203e28 t2 : 7265736572203a46 s0 : ffffffff81203c20
[ 0.000000] s1 : ffffffff81203e28 a0 : ffffffff81203d22 a1 : 0000000000000000
[ 0.000000] a2 : ffffffff81203d08 a3 : 0000000081203d21 a4 : ffffffffffffffff
[ 0.000000] a5 : 00000000401c31ac a6 : ffff0a00ffffff04 a7 : ffffffffffffffff
[ 0.000000] s2 : ffffffff81203d08 s3 : ffffffff81203d00 s4 : 0000000000000008
[ 0.000000] s5 : ffffffff000000ff s6 : 0000000000ffffff s7 : 00000000ffffff00
[ 0.000000] s8 : ffffffff80d9821a s9 : ffffffff81203d22 s10: 0000000000000002
[ 0.000000] s11: ffffffff80d9821c t3 : ffffffff812f3617 t4 : ffffffff812f3617
[ 0.000000] t5 : ffffffff812f3618 t6 : ffffffff81203d08
[ 0.000000] status: 0000000200000100 badaddr: 00000000401c31ac cause: 000000000000000d
[ 0.000000] [<ffffffff80338936>] vsnprintf+0x1e4/0x336
[ 0.000000] [<ffffffff80055ae2>] vprintk_store+0xf6/0x344
[ 0.000000] [<ffffffff80055d86>] vprintk_emit+0x56/0x192
[ 0.000000] [<ffffffff80055ed8>] vprintk_default+0x16/0x1e
[ 0.000000] [<ffffffff800563d2>] vprintk+0x72/0x80
[ 0.000000] [<ffffffff806813b2>] _printk+0x36/0x50
[ 0.000000] [<ffffffff8068af48>] print_reserved_mem+0x1c/0x24
[ 0.000000] [<ffffffff808057ec>] paging_init+0x528/0x5bc
[ 0.000000] [<ffffffff808031ae>] setup_arch+0xd0/0x592
[ 0.000000] [<ffffffff8080070e>] start_kernel+0x82/0x73c
[ 0.000000] ---[ end trace 0000000000000000 ]---
[ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
[ 0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---
We traced this back to early_init_fdt_scan_reserved_mem() in
setup_bootmem() - moving it later back up the boot sequence to
after the dt has been remapped etc has fixed the problem for us.
The least movement to get it working is attached, and also pushed
here: git.kernel.org/conor/c/1735589baefc
The second problem is a bit more complicated to explain - but we
found the solution conflicted with the remoteproc fix as we had
to move early_init_fdt_scan_reserved_mem() _earlier_ in the boot
process to solve this one.
We want to have a node in our devicetree that contains some memory
that is non-cached & marked as reserved-memory. Maybe we have just
missed something, but from what we've seen:
- the really early setup looks at the dtb, picks the highest bit
of memory and puts the dtb etc there so it can start using it
- early_init_fdt_scan_reserved_mem() is then called, which figures
out if memory is reserved or not.
Unfortunately, the highest bit of memory is the non-cached bit so
everything falls over, but we can avoid this by moving the call to
early_init_fdt_scan_reserved_mem() above the dtb memblock alloc that
takes place right before it in setup_bootmem().
And then I suppose the allocations you are mentioning happen in unflatten_XXX, so parsing the device tree for reserved memory nodes before this should do the trick. Does the following fix your second issue?
diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
index 2b09f0bd8432..94b3d049fe9d 100644
--- a/arch/riscv/kernel/setup.c
+++ b/arch/riscv/kernel/setup.c
@@ -277,14 +277,15 @@ void __init setup_arch(char **cmdline_p)
paging_init();
#if IS_ENABLED(CONFIG_BUILTIN_DTB)
initial_boot_params = __va(XIP_FIXUP(dtb_early_pa));
+ early_init_fdt_scan_reserved_mem();
unflatten_and_copy_device_tree();
#else
- if (early_init_dt_verify(__va(XIP_FIXUP(dtb_early_pa))))
+ if (early_init_dt_verify(__va(XIP_FIXUP(dtb_early_pa)))) {
+ early_init_fdt_scan_reserved_mem();
unflatten_device_tree();
- else
+ } else
pr_err("No DTB found in kernel mappings\n");
#endif
- early_init_fdt_scan_reserved_mem();
misc_mem_init();
init_resources();
Obviously, both of these changes are moving the function call in
opposite directions and we can only really do one of them. We are not
sure if what we are doing with the non-cached reserved-memory section
is just not permitted & cannot work - or if this is something that
was overlooked for RISC-V specifically and works for other archs.
It does seem like the first issue is a real bug, and I am happy to
submit the patch for that whenever - but having two problems with
opposite fixes seemed as if there was something else lurking that we
just don't have enough understanding to detect.
Any help would be great!
Thanks,
Conor.
Even if that does not fix your issue, the first patch is necessary as it fixes initial_boot_params.
_______________________________________________
linux-riscv mailing list
linux-riscv@xxxxxxxxxxxxxxxxxxx
http://lists.infradead.org/mailman/listinfo/linux-riscv