Re: PCI device appears intermittently

From: Bjorn Helgaas
Date: Fri Apr 05 2019 - 10:10:56 EST


[+cc Borislav, Alan, Marcel, Johan, linux-usb, LKML]

On Tue, Apr 02, 2019 at 08:09:25AM -0500, Bjorn Helgaas wrote:
> On Thu, Mar 28, 2019 at 09:22:03PM -0400, Ron Murray wrote:
> > I have an ASRock 970A-G/3.1 motherboard, which, with current Linux
> > kernels, occasionally "finds" an extra PCI device on the initial
> > scan.
>
> Hmmm, I don't have any good ideas. You mention "current" kernels. Is
> this a regression? If there is an earlier kernel that never finds
> this extra device, it's possible we could find the problem by
> bisecting. It's a little harder with intermittent problems like this,
> though.
>
> Is there any rhyme or reason to when the problem occurs? Do you dual
> boot with Windows? Does it happen after an unusual shutdown (crash,
> oops, etc)? Is there anything connected to that port?
>
> Can you collect the output of "sudo lspci -vvv" and the dmesg logs for
> successful and failing boots? Maybe attach them to a
> bugzilla.kernel.org entry.

The bugzilla entry is https://bugzilla.kernel.org/show_bug.cgi?id=203157
Thanks, Ron!

I unpacked the tar file and attached the individual files. I think
they might be labeled backwards, though. Compared to lspci-vvv.good,
lspci-vvv.bad contains two extra devices:

00:09.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890/RD9x0/RX980 PCI to PCI bridge (PCI Express GPP Port 4)
02:00.0 USB controller: ASMedia Technology Inc. ASM1143 USB 3.1 Host Controller

The 00:09.0 bridge leads to bus 02, i.e., to the XHCI USB device.
Obviously if we don't find the bridge, we won't find the USB device
either.

Is there anything you can relate to the USB device? I assume it's not
something like "the USB controller appears only when there's a USB
device connected to it"? No flaky USB device connected (low battery
or something)?

Looking at USB, I see this weirdness:

$ grep "BCM\|000272C95496" dmesg.*
dmesg.bad:usb 4-2: Product: BCM20702A0
dmesg.bad:usb 4-2: SerialNumber: 000272C95496
dmesg.bad:Bluetooth: hci0: BCM: chip id 63
dmesg.bad:Bluetooth: hci0: BCM: features 0x07
dmesg.bad:Bluetooth: hci0: BCM20702A
dmesg.bad:Bluetooth: hci0: BCM20702A1 (001.002.014) build 0000
dmesg.bad:Bluetooth: hci0: BCM20702A1 (001.002.014) build 1338
dmesg.good:usb 4-2: Product: BCM920702 Bluetooth 4.0
dmesg.good:usb 4-2: SerialNumber: 000272C95496
dmesg.good:Bluetooth: hci0: BCM: chip id 63
dmesg.good:Bluetooth: hci0: BCM: features 0x07
dmesg.good:Bluetooth: hci0: BCM20702A1 (001.002.014) build 1338
dmesg.good:Bluetooth: hci0: BCM20702A1 (001.002.014) build 1338

Looks like the same device, but for some reason it identifies
differently. Added some Bluetooth guys in case they have an idea;
I sure don't.

> > I wouldn't mind, but it finds it early in the piece, and that
> > changes the PCI allocation of my Ethernet board from 02:06.0 to
> > 03:06.0, and, with systemd, Linux comes up with no network
> > connection. A reboot fixes it, mostly.
> >
> > Here's the first few lines of 'lspci' when Linux doesn't find the
> > extra device
> >
> > > 00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD9x0/RX980 Host Bridge (rev 02)
> > > 00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD/ATI] RD890S/RD990 I/O Memory Management Unit (IOMMU)
> > > 00:02.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890/RD9x0/RX980 PCI to PCI bridge (PCI Express GFX port 0)
> > > 00:11.0 SATA controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 SATA Controller [AHCI mode] (rev 40)
> >
> > and here's the same thing when it does:
> >
> > > 00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD9x0/RX980 Host Bridge (rev 02)
> > > 00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD/ATI] RD890S/RD990 I/O Memory Management Unit (IOMMU)
> > > 00:02.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890/RD9x0/RX980 PCI to PCI bridge (PCI Express GFX port 0)
> > > 00:09.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890/RD9x0/RX980 PCI to PCI bridge (PCI Express GPP Port 4)
> > > 00:11.0 SATA controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 SATA Controller [AHCI mode] (rev 40)
> > > 00:12.0 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB OHCI0 Controller
> >
> > The 00:09.0 device is the extra one. Anything else I can provide to assist?