Re: PCI: Race condition in pci_create_sysfs_dev_files (can't boot)

From: Koen Vandeputte
Date: Wed Apr 13 2022 - 08:53:06 EST



On 06.04.22 16:08, Koen Vandeputte wrote:

On 01.04.22 15:50, Koen Vandeputte wrote:

On 15.02.22 07:35, Krzysztof Hałasa wrote:
Hi Bjorn,

Bjorn Helgaas <helgaas@xxxxxxxxxx> writes:

Koen collected some interesting logs at
https://lore.kernel.org/all/cd4812f0-1de3-0582-936c-ba30906595af@xxxxxxxxxxxx/
They're from v5.10, which was before all of Krzysztof W's nice work
converting to static attributes, but Koen's log shows the error
happening in the pci_sysfs_init() initcall, which is *after*
imx6_pcie_probe():

   imx6_pcie_probe                # probably device initcall (level 6)
     ...
       pci_create_sysfs_dev_files

   pci_sysfs_init                 # late initcall (level 7)
     pci_create_sysfs_dev_files
       "sysfs: cannot create duplicate filename"
Well, imx6_pcie_probe() is called indirectly by
platform_driver_register(). I guess it doesn't know about the initcall
ordering, after it's registered.

It looks like the problem is the imx6_pcie_probe() (via
dw_pcie_host_init() -> pci_host_probe()) is interfering with
pci_sysfs_init(). This may eventually cause some invalid memory access
as well.

BTW I thought for a moment that maybe 5.14 is free from this. I was
wrong. The problem doesn't manifest itself on my custom i.MX6 device
(using Tinyrex CPU module from Voipac/Fedevel, perhaps because I don't
use any PCI devices there). It does on Ventana SBC from Gateworks,
though. BTW the above (and below) is v5.16.

It goes like this:
[0.096212] do_initcall_level: 6
[0.105625] imx6_pcie_init
[0.106106] imx6_pcie_probe <<<<<<<<<<<<<<<<<<<<<
[0.106412] imx6q-pcie 1ffc000.pcie: host bridge /soc/pcie@1ffc000 ranges:

[0.322613] imx6q-pcie 1ffc000.pcie: Link up
[0.322776] imx6q-pcie 1ffc000.pcie: PCI host bridge to bus 0000:00
[0.322790] pci_bus 0000:00: root bus resource [bus 00-ff]

[0.405251] do_initcall_level: 6 ENDs but imx6_pcie_probe() still active
[0.405262] do_initcall_level: 7

[0.410393] pci_sysfs_init <<<<<<<<<<<<<<<<<<<<<
[0.410423] pci 0000:00:00.0: pci_create_sysfs_dev_files

[0.410532] [<8068091c>] (pci_create_sysfs_dev_files)
[0.410551] [<80918710>] (pci_sysfs_init)
[0.410568] [<8010166c>] (do_one_initcall)

[0.410717] pci_sysfs_init END <<<<<<<<<<<<<<<<<<<<<

[0.533843] [<803f1c74>] (pci_bus_add_devices)
[0.533862] [<803f574c>] (pci_host_probe)
[0.533879] [<80414310>] (dw_pcie_host_init)
[0.533895] [<80681ac8>] (imx6_pcie_probe)
[0.533915] [<8045e9e4>] (platform_probe)
(Repeats multiple times, I guess for each PCI device)

[0.543893] imx6_pcie_probe END <<<<<<<<<<<<<<<<<<<<<

[0.692244] do_initcall_level: 7 END


Hi all,

Any update on this topic?
I just tested kernel 5.15 on imx6 (gateworks Ventana 5200) and as soon as I connect a pcie device on one of the ports,

following happens:

https://pastebin.com/raw/mgfSvTRB

Any idea if this is related?


Thanks,

Koen

Hi all,

I tested a bit more today and simply let the board reboot all day long.
After roughly 20 reboots, it suddenly booted once stable without any errors/warnings.

Looks like a race condition ..

Any idea?

Thanks,

Koen

As an additional addendum:

This issue is seen on a Gateworks Ventana gw5200 which has a PLX bridge.
I also have a GW5100 which is identical but without the PLX bridge, and it works fine every time.

So when a PCI device is sitting behind a bridge, the issue is triggered.


Hope this helps to easily reproduce.

Koen