Re: kernelci/staging-next bisection: sleep.login on rk3288-rock2-square #2286-staging

From: Guillaume Tucker
Date: Fri Dec 18 2020 - 17:04:32 EST


On 13/12/2020 08:23, Mike Rapoport wrote:
> Hi Guillaume,
>
> On Fri, Dec 11, 2020 at 09:53:46PM +0000, Guillaume Tucker wrote:
>> Hi Mike,
>>
>> Please see the bisection report below about a boot failure on
>> rk3288 with next-20201210.
>>
>> Reports aren't automatically sent to the public while we're
>> trialing new bisection features on kernelci.org but this one
>> looks valid.
>>
>> There's nothing in the serial console log, probably because it's
>> crashing too early during boot. This was confirmed on two rk3288
>> platforms on kernelci.org: rk3288-veyron-jaq and
>> rk3288-rock2-square. There's no clear sign about other platforms
>> being impacted.
>>
>> If this looks like something you want to investigate but you
>> don't have a platform at hand to reproduce it, please let us know
>> if you would like the test to be re-run on kernelci.org with some
>> debug config turned on, or if you have a fix to try.
>
> I'd apprciate if you can build a working kernel with
> CONFIG_DEBUG_MEMORY_INIT=y and run it with
>
> memblock=debug mminit_loglevel=4
>
> in the command line.
>
> If I understand correctly, DEBUG_LL is not an option for these platforms
> so if earlyprintk didn't display the log there is not much to do about
> it.

OK, sorry for the delay. I've built a kernel and booted it as
you requested, and also found that the issue was due to this
memory area defined in arch/arm/boot/dts/rk3288.dtsi:

reserved-memory {
#address-cells = <2>;
#size-cells = <2>;
ranges;

/*
* The rk3288 cannot use the memory area above 0xfe000000
* for dma operations for some reason. While there is
* probably a better solution available somewhere, we
* haven't found it yet and while devices with 2GB of ram
* are not affected, this issue prevents 4GB from booting.
* So to make these devices at least bootable, block
* this area for the time being until the real solution
* is found.
*/
dma-unusable@fe000000 {
reg = <0x0 0xfe000000 0x0 0x1000000>;
};
};

So I've put a hack[1] on top of 950c37691925 to skip adding a
node in memblock_enforce_memory_reserved_overlap() if the base
address is 0xfe000000, which got the kernel booting. Here's the
console log:

https://people.collabora.com/~gtucker/tmp/2966825.txt

and the full test job details, if this helps:

https://lava.collabora.co.uk/scheduler/job/2966825


I haven't really looked much further than that, but I'll be
available on Monday to help run other tests if needed.

Thanks,
Guillaume

[1] https://people.collabora.com/~gtucker/tmp/2966825.patch