Re: RISC-V reserved memory problems
From: Mike Rapoport
Date: Tue Mar 07 2023 - 06:36:32 EST
Hi Conor,
Sorry for the delay, somehow this slipped between the cracks.
On Thu, Feb 02, 2023 at 10:01:26PM +0000, Conor Dooley wrote:
> Hullo Palmer, Mike & whoever else may read this,
>
> Just reviving this thread from a little while ago as I have been in the
> area again recently...
TBH, I didn't really dig deep into the issues, but the thought I had was
what if DT was mapped via fixmap until the setup_vm_final() and then it
would be possible to call DT methods early.
Could be I'm shooting in the dark :)
> On Tue, Aug 16, 2022 at 08:41:05PM +0000, Conor.Dooley@xxxxxxxxxxxxx wrote:
> > Hey all,
> > We've run into a bit of a problem with reserved memory on PolarFire, or
> > more accurately a pair of problems that seem to have opposite fixes.
> >
> > The first of these problems is triggered when trying to implement a
> > remoteproc driver. To get the reserved memory buffer, remoteproc
> > does an of_reserved_mem_lookup(), something like:
> >
> > np = of_parse_phandle(pdev->of_node, "memory-region", 0);
> > if (!np)
> > return -EINVAL;
> >
> > rmem = of_reserved_mem_lookup(np);
> > if (!rmem)
> > return -EINVAL;
> >
> > of_reserved_mem_lookup() then uses reserved_mem[i].name to try and find
> > a match - but this was triggering kernel panics for us. We did some
> > debugging and found that the name string's pointer was pointing to an
> > address in the 0x4000_0000 range. The minimum reproduction for this
> > crash is attached - it hacks in some print_reserved_mem()s into
> > setup_vm_final() around a tlb flush so you can see the before/after.
> > (You'll need a reserved memory node in your dts to replicate)
> >
> > The output is like so, with the same crash as in the remoteproc driver:
> >
> > [ 0.000000] Linux version 6.0.0-rc1-00001-g0d9d6953d834 (conor@wendy) (riscv64-unknown-linux-gnu-gcc (g5964b5cd727) 11.1.0, GNU ld (GNU Binutils) 2.37) #1 SMP Tue Aug 16 13:42:09 IST 2022
>
> [...]
>
> > [ 0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---
> >
> > We traced this back to early_init_fdt_scan_reserved_mem() in
> > setup_bootmem() - moving it later back up the boot sequence to
> > after the dt has been remapped etc has fixed the problem for us.
> >
> > The least movement to get it working is attached, and also pushed
> > here: git.kernel.org/conor/c/1735589baefc
>
> This one is fixed now, as of commit 50e63dd8ed92 ("riscv: fix reserved
> memory setup").
>
> > The second problem is a bit more complicated to explain - but we
> > found the solution conflicted with the remoteproc fix as we had
> > to move early_init_fdt_scan_reserved_mem() _earlier_ in the boot
> > process to solve this one.
> >
> > We want to have a node in our devicetree that contains some memory
> > that is non-cached & marked as reserved-memory. Maybe we have just
> > missed something, but from what we've seen:
> > - the really early setup looks at the dtb, picks the highest bit
> > of memory and puts the dtb etc there so it can start using it
> > - early_init_fdt_scan_reserved_mem() is then called, which figures
> > out if memory is reserved or not.
> >
> > Unfortunately, the highest bit of memory is the non-cached bit so
> > everything falls over, but we can avoid this by moving the call to
> > early_init_fdt_scan_reserved_mem() above the dtb memblock alloc that
> > takes place right before it in setup_bootmem().
> >
> > Obviously, both of these changes are moving the function call in
> > opposite directions and we can only really do one of them. We are not
> > sure if what we are doing with the non-cached reserved-memory section
> > is just not permitted & cannot work - or if this is something that
> > was overlooked for RISC-V specifically and works for other archs.
>
> We ended up working around this one by making sure that U-Boot loaded
> the dtb to somewhere that would be inside the kernel's memory map, thus
> avoiding the remapping in the first place.
>
> We did run into another problem recently though, and 50e63dd8ed92 is
> kinda at fault for it.
> This particular issue was encountered with a devicetree where the
> top-most memory region was entirely reserved & was not observed prior
> to my fix for the first issue.
>
> On RISC-V, the boot sequence is something like:
> setup_bootmem();
> setup_vm_final();
> unflatten_device_tree();
> early_init_fdt_scan_reserved_mem();
>
> Whereas, before my patch it used to be (give-or-take):
> setup_bootmem();
> early_init_fdt_scan_reserved_mem();
> setup_vm_final();
> unflatten_device_tree();
>
> The difference being that we used to have scanned the reserved memory
> regions before calling setup_vm_final() & therefore know which regions
> we cannot use. As a reminder, calling early_init_fdt_scan_reserved_mem()
> before we've got the dt in a proper virtual memory address will cause
> the kernel to panic if it tries to read a reserved memory node's label.
>
> As we are now calling setup_vm_final() *before* we know what the
> reserved memory regions are & as RISC-V allocates memblocks from the top
> down, the allocations in setup_vm_final() will be done in the highest
> memory region.
> When early_init_fdt_scan_reserved_mem() then tries to reserve the
> entirety of that top-most memory region, the reservation fails as part
> of this region has already been allocated.
> In the scenario where I found this bug, that top-most region is non-
> cached memory & the kernel ends up panicking.
> The memblock debug code made this pretty easy to spot, otherwise I'd
> probably have spent more than just a few hours trying to figure out why
> it was panicking!
>
> My "this needs to be fixed today" solution for this problem was calling
> memblock_set_bottom_up(true) in setup_bootmem() & that's what we are
> going to carry downstream for now.
>
> I haven't tested it (yet) but I suspect that it would also fix our
> problem of the dtb being remapped into a non-cached region of memory
> that we would later go on to reserve too. Non-cached being an issue
> mainly due to the panicking, but failing to reserve (and using!) memory
> regions that are meant to be reserved is very far from ideal even when
> they are memory that the kernel can actually use.
>
> I have no idea if that is an acceptable solution for upstream though, so
> I guess this is me putting out feelers as to whether this is something I
> should send a patch to do *OR* if this is another sign of the issues
> that you (Mike, Palmer) mentioned in the past.
> If it isn't an acceptable solution, I'm not really too sure how to
> proceed!
>
> Cheers,
> Conor.
>
--
Sincerely yours,
Mike.