Re: [PATCH v3 1/2] init/initramfs.c: do unpacking asynchronously

From: Bruno Goncalves
Date: Tue Jul 27 2021 - 10:50:28 EST


On Tue, Jul 27, 2021 at 4:42 PM Luis Chamberlain <mcgrof@xxxxxxxxxx> wrote:
>
> On Tue, Jul 27, 2021 at 04:27:08PM +0200, Bruno Goncalves wrote:
> > On Tue, Jul 27, 2021 at 4:21 PM Luis Chamberlain <mcgrof@xxxxxxxxxx> wrote:
> > >
> > > On Tue, Jul 27, 2021 at 04:12:54PM +0200, Bruno Goncalves wrote:
> > > > On Tue, Jul 27, 2021 at 3:55 PM Luis Chamberlain <mcgrof@xxxxxxxxxx> wrote:
> > > > >
> > > > > On Tue, Jul 27, 2021 at 09:31:54AM +0200, Bruno Goncalves wrote:
> > > > > > On Mon, Jul 26, 2021 at 1:46 PM Rasmus Villemoes
> > > > > > <linux@xxxxxxxxxxxxxxxxxx> wrote:
> > > > > > >
> > > > > > > On 24/07/2021 09.46, Alexander Egorenkov wrote:
> > > > > > > > Hello,
> > > > > > > >
> > > > > > > > since e7cb072eb988 ("init/initramfs.c: do unpacking asynchronously"), we
> > > > > > > > started seeing the following problem on s390 arch regularly:
> > > > > > > >
> > > > > > > > [ 5.039734] wait_for_initramfs() called before rootfs_initcalls
> > > > >
> > > > > So some context here, which might help.
> > > > >
> > > > > The initramfs_cookie is initialized until a a rootfs_initcall() is
> > > > > called, in this case populate_rootfs(). The code is small, so might
> > > > > as well include it:
> > > > >
> > > > > static int __init populate_rootfs(void)
> > > > > {
> > > > > initramfs_cookie = async_schedule_domain(do_populate_rootfs, NULL,
> > > > > &initramfs_domain);
> > > > > if (!initramfs_async)
> > > > > wait_for_initramfs();
> > > > > return 0;
> > > > > }
> > > > > rootfs_initcall(populate_rootfs);
> > > > >
> > > > > The warning you see comes from a situation where a wait_for_initramfs()
> > > > > gets called but we haven't yet initialized initramfs_cookie. There are
> > > > > only a few calls for wait_for_initramfs() in the kernel, and the only
> > > > > thing I can think of is that somehow s390 may rely on a usermode helper
> > > > > early on, but not every time.
> > > > >
> > > > > What umh calls does s390 issue?
> > > > >
> > > > > > Unfortunately, we haven't been able to find the root cause, but since
> > > > > > June 23rd we haven't hit this panic...
> > > > > >
> > > > > > Btw, this panic we were hitting only when testing kernels from "scsi"
> > > > > > and "block" trees.
> > > > >
> > > > > Do you use drdb maybe?
> > > >
> > > > No, the machines we were able to reproduce the problem don't have drdb.
> > >
> > > Are there *any* umh calls early on boot on the s390 systems? If so
> > > chances are that is the droid you are looking for.
> >
> > Sorry Luis,
> >
> > I was just replying the question mentioning an old thread
> > (https://lore.kernel.org/lkml/CA+QYu4qxf2CYe2gC6EYnOHXPKS-+cEXL=MnUvqRFaN7W1i6ahQ@xxxxxxxxxxxxxx/T/#u)
> > on ppc64le.
> >
> > regarding the "umh" it doesn't show anything on ppc64le boot.
>
> There is not a single pr_*() call on kernel/umh.c, and so unless the
> respective ppc64le / s390 umh callers have a print, we won't know if you
> really did use a print.
>
> Can you reproduce the failure? How often?

The ppc64le panic we were able to reproduce it often using specific
machines, but last time we've hit this panic was on June 23rd when we
tested commit 444ef33be31f3c27ea24e60d5d9f2de9247d64be on
https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git
since then we haven't hit the panic anymore.

Bruno

>
> Luis
>