Re: Panic on ppc64le using kernel 5.13.0-rc3

From: Rasmus Villemoes
Date: Fri Jun 11 2021 - 03:14:27 EST


On 10/06/2021 17.14, Bruno Goncalves wrote:
> On Thu, Jun 10, 2021 at 3:02 PM Rasmus Villemoes
> <linux@xxxxxxxxxxxxxxxxxx> wrote:
>>
>> On 10/06/2021 13.47, Bruno Goncalves wrote:
>>> Hello,
>>>
>>> We've observed in some cases kernel panic when trying to boot on
>>> ppc64le using a kernel based on 5.13.0-rc3. We are not sure if it
>>> could be related to patch
>>> https://lore.kernel.org/lkml/20210313212528.2956377-2-linux@xxxxxxxxxxxxxxxxxx/
>>>
>>
>> Thanks for the report. It's possible, but I'll need some help from you
>> to get more info.
>>
>> First, can you send me the .config?
>
> The .config is on
> https://s3.us-east-1.amazonaws.com/arr-cki-prod-datawarehouse-public/datawarehouse-public/2021/06/09/317881801/build_ppc64le_redhat:1332368174/kernel-block-ppc64le-d3f02e52f5548006f04358d407bbb7fe51255c41.config

Thanks.

>>
>>>
>>> [ 1.516075] wait_for_initramfs() called before rootfs_initcalls
>>
>> This is likely because you have CONFIG_UEVENT_HELPER_PATH set to some
>> non-empty path (/sbin/hotplug perhaps). This did get reported once before:
>>
>
> CONFIG_UEVENT_HELPER_PATH is not set. In the .config we have "#
> CONFIG_UEVENT_HELPER is not set"

OK. Then I assume some quite early initcall does a request_module() or
request_firmware() (or similar). I don't think this matters - that call
would be done before the initramfs was unpacked with or without my
patch, so it won't find anything in the empty rootfs. It's just my patch
added a note. But just to figure out where that triggers, can you do

- pr_warn_once("wait_for_initramfs() called before
rootfs_initcalls\n");
+ WARN_ONCE(1, "wait_for_initramfs() called before
rootfs_initcalls\n");

in init/initramfs.c.

>>> [ 1.764430] Initramfs unpacking failed: no cpio magic
>>
>> Whoa, that's not good. Did something scramble over the initramfs memory
>> while it was being unpacked? It's been .2 seconds since the start of the
>> unpacking, so it's unlikely the very beginning of the initramfs is corrupt.
>>
>> Can you try booting with initramfs_async=0 on the command line and see
>> if the kernel still crashes?
>
> We are not able to reproduce it 100% of the time, but sure I can try
> with this option and see what happens.
>
> We've also seen:
> Initramfs unpacking failed: junk within compressed archive
>
> This can be seen on the other 2 console logs that I provided the link to.

Yes, I saw that. This, and the fact that it's not 100% reproducible, is
consistent with the problem being some race that happens to write over
the compressed initramfs image - sometimes, the decompressor can still
make sense of the bits, but the output is no longer a valid cpio
archive, and sometimes already the decompressor notices the corruption.

I wonder if there is some way to mark the pages occupied by the
compressed initramfs as read-only - what would hopefully trigger a nice
crash with a backtrace to whoever writes to that memory.

Rasmus