Re: Help needed in understanding weird PCIe issue on imx6q (PCIe just goes bad)

From: Bjorn Helgaas
Date: Wed Feb 26 2020 - 18:27:17 EST


[+cc Richard, Lucas]

On Wed, Feb 26, 2020 at 05:25:52PM -0600, Bjorn Helgaas wrote:
> On Sat, Feb 22, 2020 at 04:25:41PM +0100, Fawad Lateef wrote:
> > Hello,
> >
> > I am trying to figure-out an issue on our i.MX6Q platform based design
> > where PCIe interface goes bad.
> >
> > We have a Phytec i.MX6Q eMMC SOM, attached to our custom designed
> > board. PCIe root-complex from i.MX6Q is attached to PLX switch
> > (PEX8605).
> >
> > Linux kernel version is 4.19.9x and also 4.14.134 (from phytec's
> > linux-mainline repo). Kernel do not have PCIe hot-plug and PNP enabled
> > in config.
> >
> > PLX switch #PERST is attached to a GPIO pin and stays in disable state
> > until Linux is booted. So at boot time only PCIe root-complex is
> > initialized by kernel.
> >
> > After boot if I do "lspci -v" and see everything good from PCIe
> > root-complex (below):
> >
> > ~ # lspci -v
> > 00:00.0 PCI bridge: Synopsys, Inc. Device abcd (rev 01) (prog-if 00
> > [Normal decode])
> > Flags: bus master, fast devsel, latency 0, IRQ 295
> > Memory at 01000000 (32-bit, non-prefetchable) [size=1M]
> > Bus: primary=00, secondary=01, subordinate=ff, sec-latency=0
> > I/O behind bridge: None
> > Memory behind bridge: None
> > Prefetchable memory behind bridge: None
> > [virtual] Expansion ROM at 01100000 [disabled] [size=64K]
> > Capabilities: [40] Power Management version 3
> > Capabilities: [50] MSI: Enable+ Count=1/1 Maskable+ 64bit+
> > Capabilities: [70] Express Root Port (Slot-), MSI 00
> > Capabilities: [100] Advanced Error Reporting
> > Capabilities: [140] Virtual Channel
> > Kernel driver in use: pcieport
> >
> >
> > Then I enable the #PERST pin of PLX switch, everything is still good
> > (no rescan on Linux is done yet)
> >
> > ~ # echo 139 > /sys/class/gpio/export
> > ~ # echo out > /sys/class/gpio/gpio139/direction
> > ~ # echo 1 > /sys/class/gpio/gpio139/value
> > ~ # lspci -v
> > 00:00.0 PCI bridge: Synopsys, Inc. Device abcd (rev 01) (prog-if 00
> > [Normal decode])
> > Flags: bus master, fast devsel, latency 0, IRQ 295
> > Memory at 01000000 (32-bit, non-prefetchable) [size=1M]
> > Bus: primary=00, secondary=01, subordinate=ff, sec-latency=0
> > I/O behind bridge: None
> > Memory behind bridge: None
> > Prefetchable memory behind bridge: None
> > [virtual] Expansion ROM at 01100000 [disabled] [size=64K]
> > Capabilities: [40] Power Management version 3
> > Capabilities: [50] MSI: Enable+ Count=1/1 Maskable+ 64bit+
> > Capabilities: [70] Express Root Port (Slot-), MSI 00
> > Capabilities: [100] Advanced Error Reporting
> > Capabilities: [140] Virtual Channel
> > Kernel driver in use: pcieport
> >
> >
> > Now just disable/put-in-reset the PLX switch (Linux don't see the
> > switch yet, as no rescan on PCIe was done). Now "lspci -v" and
> > root-complex goes bad.
> >
> > ~ # echo 0 > /sys/class/gpio/gpio139/value
> > ~ # lspci -v
> > 00:00.0 PCI bridge: Synopsys, Inc. Device abcd (rev 01) (prog-if 00
> > [Normal decode])
> > Flags: fast devsel, IRQ 295
> > Memory at 01000000 (64-bit, prefetchable) [disabled] [size=1M]
> > Bus: primary=00, secondary=00, subordinate=00, sec-latency=0
> > I/O behind bridge: 00000000-00000fff [size=4K]
> > Memory behind bridge: 00000000-000fffff [size=1M]
> > Prefetchable memory behind bridge: 00000000-000fffff [size=1M]
> > [virtual] Expansion ROM at 01100000 [disabled] [size=64K]
> > Capabilities: [40] Power Management version 3
> > Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
> > Capabilities: [70] Express Root Port (Slot-), MSI 00
> > Capabilities: [100] Advanced Error Reporting
> > Capabilities: [140] Virtual Channel
> > Kernel driver in use: pcieport
> >
> > ~ # uname -a
> > Linux buildroot-2019.08-imx6 4.14.134-phy2 #1 SMP Thu Feb 20 12:13:33
> > UTC 2020 armv7l GNU/Linux
> > ~ #
> >
> >
> > I am really not sure what is going wrong here. Did I am missing
> > something basic?
>
> I agree, it looks like something's wrong, but I really don't have any
> ideas.
>
> I would start by using "lspci -xxxx" to see the actual values we get
> from config space. It looks like we're reading zeros from at least
> the bus and window registers.
>
> You could also instrument the i.MX config accessors in case there's
> something strange going on there. Maybe try to reproduce this on a
> current upstream kernel?
>
> Bjorn