Re: [RFC HACK PATCH] PCI: dwc: layerscape: Hack around enumeration problems with Honeycomb LX2K

From: Daniel Thompson
Date: Fri Dec 11 2020 - 12:08:57 EST


On Fri, Dec 11, 2020 at 08:37:40AM -0600, Rob Herring wrote:
> On Fri, Dec 11, 2020 at 6:15 AM Daniel Thompson
> <daniel.thompson@xxxxxxxxxx> wrote:
> >
> > I have been chasing down a problem enumerating an NVMe drive on a
> > Honeycomb LX2K (NXP LX2160A). Specifically the drive can only enumerate
> > successfully if the we are emitting lots of console messages via a UART.
> > If the system is booted with `quiet` set then enumeration fails.
>
> We really need to get away from work-arounds for device X on host Y. I
> suspect they are not limited to that combination only...

No objection on that. This patch was essentially sharing the result of
an investigation where I got stuck at the "now fix it properly" stage!


> How exactly does it fail to enumerate? There's a (racy) linkup check
> on config accesses, is it reporting link down and skipping config
> accesses?

In dmesg terms it looked like this:

--- nvme_borked_gpu_working.linted.dmesg 2020-11-18 15:28:35.544118213 +0000
+++ nvme_working_gpu_working.linted.dmesg 2020-11-18 15:28:56.180076946 +0000
@@ -241,11 +241,19 @@
pci 0000:00:00.0: reg 0x38: [mem 0x9048000000-0x9048ffffff pref]
pci 0000:00:00.0: supports D1 D2
pci 0000:00:00.0: PME# supported from D0 D1 D2 D3hot
+pci 0000:01:00.0: [15b7:5009] type 00 class 0x010802
+pci 0000:01:00.0: reg 0x10: [mem 0x9049000000-0x9049003fff 64bit]
+pci 0000:01:00.0: reg 0x20: [mem 0x9049004000-0x90490040ff 64bit]
pci 0000:00:00.0: BAR 1: assigned [mem 0x9040000000-0x9043ffffff]
pci 0000:00:00.0: BAR 0: assigned [mem 0x9044000000-0x9044ffffff]
pci 0000:00:00.0: BAR 6: assigned [mem 0x9045000000-0x9045ffffff pref]
+pci 0000:00:00.0: BAR 14: assigned [mem 0x9046000000-0x90460fffff]
+pci 0000:01:00.0: BAR 0: assigned [mem 0x9046000000-0x9046003fff 64bit]
+pci 0000:01:00.0: BAR 4: assigned [mem 0x9046004000-0x90460040ff 64bit]
pci 0000:00:00.0: PCI bridge to [bus 01-ff]
+pci 0000:00:00.0: bridge window [mem 0x9046000000-0x90460fffff]
pci 0000:00:00.0: Max Payload Size set to 256/ 256 (was 128), Max Read Rq 256
+pci 0000:01:00.0: Max Payload Size set to 256/ 512 (was 128), Max Read Rq 256
layerscape-pcie 3800000.pcie: host bridge /soc/pcie@3800000 ranges:
layerscape-pcie 3800000.pcie: MEM 0xa040000000..0xa07fffffff -> 0x0040000000
layerscape-pcie 3800000.pcie: PCI host bridge to bus 0001:00

... and be aware that the last three lines here are another PCIe
controller coming up just fine and it is about to detect the GPU
(which like the NVMe is also gen3) just fine whether or not we
add a short delay.


> > I guessed this would be due to the timing impact of printk-to-UART and
> > tried to find out where a delay could be added to provoke a successful
> > enumeration.
> >
> > This patch contains the results. The delay length (1ms) was selected by
> > binary searching downwards until the delay was not effective for these
> > devices (Honeycomb LX2K and a Western Digital WD Blue SN550).
> >
> > I have also included the workaround twice (conditionally compiled). The
> > first change is the *latest* possible code path that we can deploy a
> > delay whilst the second is the earliest place I could find.
> >
> > The summary is that the critical window were we are currently relying on
> > a call to the console UART code can "mend" the driver runs from calling
> > dw_pcie_setup_rc() in host init to just before we read the state in the
> > link up callback.
> >
> > Signed-off-by: Daniel Thompson <daniel.thompson@xxxxxxxxxx>
> > ---
> >
> > Notes:
> > This patch is RFC (and HACK) because I don't have much clue *why* this
> > patch works... merely that this is the smallest possible change I need
> > to replicate the current accidental printk() workaround. Perhaps one
> > could argue that RFC here stands for request-for-clue. All my
> > observations and changes here are empirical and I don't know how best to
> > turn them into something that is not a hack!
> >
> > BTW I noticed many other pcie-designware drivers take advantage
> > of a function called dw_pcie_wait_for_link() in their init paths...
> > but my naive attempts to add it to the layerscape driver results
> > in non-booting systems so I haven't embarrassed myself by including
> > that in the patch!
>
> You need to look at what's pending for v5.11, because I reworked this
> to be more unified. The ordering of init is also possibly changed. The
> sequence is now like this:
>
> dw_pcie_setup_rc(pp);
> dw_pcie_msi_init(pp);
>
> if (!dw_pcie_link_up(pci) && pci->ops->start_link) {
> ret = pci->ops->start_link(pci);
> if (ret)
> goto err_free_msi;
> }
>
> /* Ignore errors, the link may come up later */
> dw_pcie_wait_for_link(pci);

Thanks. That looks likely to fix it since IIUC dw_pcie_wait_for_link()
will end up waiting somewhat like the double check I added to
ls_pcie_link_up().

I'll take a look at let you know.


Daniel.