Re: [PATCH v2] arm64: dts: rockchip: Add PCIe pinctrls to Turing RK1
From: Heiko Stübner
Date: Mon Jul 29 2024 - 15:09:52 EST
Hi Jonathan, Sam,
Am Mittwoch, 5. Juni 2024, 21:45:42 CEST schrieb Jonathan Bennett:
> On 12/8/23 11:27 AM, Sam Edwards wrote:
> > On 12/8/23 04:05, Heiko Stübner wrote:
> >> Am Freitag, 8. Dezember 2023, 07:25:10 CET schrieb Sam Edwards:
> >>> The RK3588 PCIe 3.0 controller seems to have unpredictable behavior
> >>> when
> >>> no CLKREQ/PERST/WAKE pins are configured in the pinmux. In
> >>> particular, it
> >>> will sometimes (varying between specific RK3588 chips, not over
> >>> time) shut
> >>> off the DBI block, and reads to this range will instead stall
> >>> indefinitely.
> >>>
> >>> When this happens, it will prevent Linux from booting altogether. The
> >>> PCIe driver will stall the CPU core once it attempts to read the
> >>> version
> >>> information from the DBI range.
> >>>
> >>> Fix this boot hang by adding the correct pinctrl configuration to the
> >>> PCIe 3.0 device node, which is the proper thing to do anyway. While
> >>> we're at it, also add the necessary configuration to the PCIe 2.0 node,
> >>> which may or may not fix the equivalent problem over there -- but is
> >>> the
> >>> proper thing to do anyway. :)
> >>>
> >>> Fixes: 2806a69f3fef6 ("arm64: dts: rockchip: Add Turing RK1 SoM
> >>> support")
> >>> Signed-off-by: Sam Edwards <CFSworks@xxxxxxxxx>
> >>> ---
> >>>
> >>> Hi list,
> >>>
> >>> Compared to v1, v2 removes the `reset-gpios` properties as well --
> >>> this should
> >>> give control of the PCIe resets exclusively to the PCIe cores. (And
> >>> even if the
> >>> `reset-gpios` props had no effect in v1, it'd be confusing to have
> >>> them there.)
> >>
> >> Hmm, I'd think this could result in differing behaviour.
> >>
> >> I.e. I tried the same on a different board with a nvme drive on the
> >> pci30x4
> >> controller. But moving the reset from the gpio-way to "just" setting the
> >> perstn pinctrl, simply hung the controller when probing the device.
> >
> > Ah, I'm guessing it died in:
> > ver = dw_pcie_readl_dbi(pci, PCIE_VERSION_NUMBER);
> >
> > If so, that's the same hang that this patch is aiming to solve. I'm
> > starting to wonder if having muxed pins != 1 for a given signal upsets
> > the RC(s). Is your board (in an earlier boot stage:
> > reset/maskrom/bootloader) muxing a different perstn pin option to that
> > controller (and Linux doesn't know to clear it away)? Then when you
> > add the perstn pinctrl the total number of perstns muxed to the
> > controller comes to 2, and without a reset-gpios = <...>; to take it
> > away again, that number stays at 2 to cause the hang upon the DBI read?
> >
> > If so, that'd mean the change resolves the hang for me, because it
> > brings the total up to 1 (from 0), but also causes the hang for you,
> > because it brings the total up to 2 (away from 1).
> >
> >>
> >> So I guess I'd think the best way would be to split the pinctrl up
> >> into the
> >> 3 separate functions (clkreqn, perstn, waken) so that boards can include
> >> them individually.
> >
> > Mm, maybe. Though that might be more appropriate if a board comes
> > along that doesn't connect all of those signals to the same group of
> > pins. I worry that attempting to solve this hang by doing that will
> > instead just mask the real problem.
> >
> >>
> >> Nobody is using the controller pinctrl entries so far anyway :-) .
> >
> > Then it's interesting that this board requires them in order to avoid
> > a hang on boot. I'll see if there's anything else that I can find out.
>
> I've just finished testing the latest iteration of this patch with
> 6.10-rc2 on my RK1 on a Turing Pi 2 carrier board. I can confirm that
> unpatched mainline fails to boot with the same hang described here, and
> the patch does make the board boot again.
Can you possibly test if
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=28b8d7793b8573563b3d45321376f36168d77b1e
changes anything? In 6.11-rc1 now.
The PERST# toggling happening before that patch could've caused
issues with your PCIe device.
Heiko