Re: [PATCH v3 1/2] init/initramfs.c: do unpacking asynchronously

From: Alexander Egorenkov
Date: Wed Jul 28 2021 - 06:44:45 EST


Luis Chamberlain <mcgrof@xxxxxxxxxx> writes:

> On Tue, Jul 27, 2021 at 04:27:08PM +0200, Bruno Goncalves wrote:
>> On Tue, Jul 27, 2021 at 4:21 PM Luis Chamberlain <mcgrof@xxxxxxxxxx> wrote:
>> >
>> > On Tue, Jul 27, 2021 at 04:12:54PM +0200, Bruno Goncalves wrote:
>> > > On Tue, Jul 27, 2021 at 3:55 PM Luis Chamberlain <mcgrof@xxxxxxxxxx> wrote:
>> > > >
>> > > > On Tue, Jul 27, 2021 at 09:31:54AM +0200, Bruno Goncalves wrote:
>> > > > > On Mon, Jul 26, 2021 at 1:46 PM Rasmus Villemoes
>> > > > > <linux@xxxxxxxxxxxxxxxxxx> wrote:
>> > > > > >
>> > > > > > On 24/07/2021 09.46, Alexander Egorenkov wrote:
>> > > > > > > Hello,
>> > > > > > >
>> > > > > > > since e7cb072eb988 ("init/initramfs.c: do unpacking asynchronously"), we
>> > > > > > > started seeing the following problem on s390 arch regularly:
>> > > > > > >
>> > > > > > > [ 5.039734] wait_for_initramfs() called before rootfs_initcalls
>> > > >
>> > > > So some context here, which might help.
>> > > >
>> > > > The initramfs_cookie is initialized until a a rootfs_initcall() is
>> > > > called, in this case populate_rootfs(). The code is small, so might
>> > > > as well include it:
>> > > >
>> > > > static int __init populate_rootfs(void)
>> > > > {
>> > > > initramfs_cookie = async_schedule_domain(do_populate_rootfs, NULL,
>> > > > &initramfs_domain);
>> > > > if (!initramfs_async)
>> > > > wait_for_initramfs();
>> > > > return 0;
>> > > > }
>> > > > rootfs_initcall(populate_rootfs);
>> > > >
>> > > > The warning you see comes from a situation where a wait_for_initramfs()
>> > > > gets called but we haven't yet initialized initramfs_cookie. There are
>> > > > only a few calls for wait_for_initramfs() in the kernel, and the only
>> > > > thing I can think of is that somehow s390 may rely on a usermode helper
>> > > > early on, but not every time.
>> > > >
>> > > > What umh calls does s390 issue?
>> > > >
>> > > > > Unfortunately, we haven't been able to find the root cause, but since
>> > > > > June 23rd we haven't hit this panic...
>> > > > >
>> > > > > Btw, this panic we were hitting only when testing kernels from "scsi"
>> > > > > and "block" trees.
>> > > >
>> > > > Do you use drdb maybe?
>> > >
>> > > No, the machines we were able to reproduce the problem don't have drdb.
>> >
>> > Are there *any* umh calls early on boot on the s390 systems? If so
>> > chances are that is the droid you are looking for.
>>
>> Sorry Luis,
>>
>> I was just replying the question mentioning an old thread
>> (https://lore.kernel.org/lkml/CA+QYu4qxf2CYe2gC6EYnOHXPKS-+cEXL=MnUvqRFaN7W1i6ahQ@xxxxxxxxxxxxxx/T/#u)
>> on ppc64le.
>>
>> regarding the "umh" it doesn't show anything on ppc64le boot.
>
> There is not a single pr_*() call on kernel/umh.c, and so unless the
> respective ppc64le / s390 umh callers have a print, we won't know if you
> really did use a print.

I instrumented the UMH code and it seems that all wait_for_initramfs()
are triggered by request_module() from drbg.

>
> Can you reproduce the failure? How often?
>
> Luis

The failure can be reproduced almost daily but on only one special test
machine and not immediately but after running many tests. I instrumented
our devel kernel in order to find out when/how the initramfs is being corrupted.

Still not reproducible on my own test machine. Very weird.

I'll report back as soon as we have something tangible.

Regards
Alex