Re: [Bug 216859] New: PCI bridge to bus boot hang at enumeration

From: Zeno Davatz
Date: Thu Jan 19 2023 - 12:37:29 EST


Dear Bjorn

On Thu, Jan 19, 2023 at 6:00 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
>
> [+cc bjorn@xxxxxxxxxxx to avoid spamassassin]
>
> On Wed, Jan 18, 2023 at 06:04:58PM -0600, Bjorn Helgaas wrote:
> > On Fri, Jan 06, 2023 at 05:42:33PM +0100, Zeno Davatz wrote:
> > > On Fri, Dec 30, 2022 at 7:50 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
> > > > On Wed, Dec 28, 2022 at 12:42:34PM -0600, Bjorn Helgaas wrote:
> > > > > On Wed, Dec 28, 2022 at 06:42:38PM +0100, Zeno Davatz wrote:
> > > > > > On Wed, Dec 28, 2022 at 1:02 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
> > > > > > > On Wed, Dec 28, 2022 at 08:37:52AM +0000, bugzilla-daemon@xxxxxxxxxx wrote:
> > > > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=216859
> > > > > > >
> > > > > > > > Summary: PCI bridge to bus boot hang at enumeration
> > > > > > > > Kernel Version: 6.1-rc1
> > > > > > > > ...
> > > > > > >
> > > > > > > > With Kernel 6.1-rc1 the enumeration process stopped working for me,
> > > > > > > > see attachments.
> > > > > > > >
> > > > > > > > The enumeration works fine with Kernel 6.0 and below.
> > > > > > > >
> > > > > > > > Same problem still exists with v6.1. and v6.2.-rc1
> > > > > > >
> > > > > > > Thank you very much for your report, Zeno!
> > > > > > >
> > > > > > > v6.0 works, v6.1-rc1 fails. Would you mind booting v6.1-rc1 with the
> > > > > > > "ignore_loglevel initcall_debug" kernel parameters and taking a photo
> > > > > > > when it hangs?
> > > > > >
> > > > > > I will try this after Januar 7th 2023.
> > >
> > > I updated the issue:
> > >
> > > https://bugzilla.kernel.org/show_bug.cgi?id=216859
> > >
> > > I booted with the option: "ignore_loglevel initcall_debug"
> >
> > Thanks! There's so much pcie output in that picture that we can't see
> > any of the initcall logging. Can you capture another movie, but use
> > kernel parameters like "ignore_loglevel initcall_debug boot_delay=100"
> > to slow things down? The full-speed boot is too fast for the camera
> > to capture all the output. You can do this on any convenient kernel
> > that hangs.
>
> Thanks for the new movie! The last initcalls I see before the hang
> are:
>
> init_mqueue_fs
> key_proc_init
> jent_mod_init
>
> We must have returned from jent_mod_init() because I think the "saving
> config space" messages we see at the hang are from
> pcie_portdrv_init().
>
> I built 833477fce7a1 ("Merge tag 'sound-6.1-rc1' of
> git://git.kernel.org/pub/scl) with your .config and when I boot it on
> qemu, I see this:
>
> calling jent_mod_init+0x0/0x32 @ 1
> initcall jent_mod_init+0x0/0x32 returned 0 after 27185 usecs
> calling af_alg_init+0x0/0x45 @ 1
> NET: Registered PF_ALG protocol family
> ...
> calling sg_pool_init+0x0/0xb4 @ 1
> initcall sg_pool_init+0x0/0xb4 returned 0 after 462 usecs
> calling pcie_portdrv_init+0x0/0x43 @ 1
> pcieport 0000:00:1c.0: vgaarb: pci_notify
> pcieport 0000:00:1c.0: runtime IRQ mapping not provided by arch
> pcieport 0000:00:1c.0: enabling bus mastering
> pcieport 0000:00:1c.0: PME: Signaling with IRQ 24
> pcieport 0000:00:1c.0: AER: enabled with IRQ 24
> pcieport 0000:00:1c.0: saving config space at offset 0x0 (reading 0x34208086)
> pcieport 0000:00:1c.0: saving config space at offset 0x4 (reading 0x100507)
> pcieport 0000:00:1c.0: saving config space at offset 0x8 (reading 0x6040002)
> ...
>
> Would you mind trying again with "boot_delay=1000 pcie_ports=compat"?
>
> "boot_delay=1000" should slow it down more (all the action is in the
> last 3 seconds and it's still hard to see) and "pcie_ports=compat"
> should turn off the PCIe port driver.

Done. Please see:

https://bugzilla.kernel.org/show_bug.cgi?id=216859#c42

Best
Zeno