Re: [PATCH v2] PCI: rockchip: Avoid accessing PCIe registers with clocks gated

From: Bjorn Helgaas
Date: Thu Jun 24 2021 - 19:28:47 EST


On Fri, Jun 25, 2021 at 12:18:48AM +0100, Robin Murphy wrote:
> On 2021-06-24 22:57, Bjorn Helgaas wrote:
> > On Tue, Jun 08, 2021 at 10:04:09AM +0200, Javier Martinez Canillas wrote:
> > > IRQ handlers that are registered for shared interrupts can be called at
> > > any time after have been registered using the request_irq() function.
> > >
> > > It's up to drivers to ensure that's always safe for these to be called.
> > >
> > > Both the "pcie-sys" and "pcie-client" interrupts are shared, but since
> > > their handlers are registered very early in the probe function, an error
> > > later can lead to these handlers being executed before all the required
> > > resources have been properly setup.
> > >
> > > For example, the rockchip_pcie_read() function used by these IRQ handlers
> > > expects that some PCIe clocks will already be enabled, otherwise trying
> > > to access the PCIe registers causes the read to hang and never return.
> >
> > The read *never* completes? That might be a bit problematic because
> > it implies that we may not be able to recover from PCIe errors. Most
> > controllers will timeout eventually, log an error, and either
> > fabricate some data (typically ~0) to complete the CPU's read or cause
> > some kind of abort or machine check.
> >
> > Just asking in case there's some controller configuration that should
> > be tweaked.
>
> If I'm following correctly, that'll be a read transaction to the native side
> of the controller itself; it can't complete that read, or do anything else
> either, because it's clock-gated, and thus completely oblivious (it might be
> that if another CPU was able to enable the clocks then everything would
> carry on as normal, or it might end up totally deadlocking the SoC
> interconnect). I think it's safe to assume that in that state nothing of
> importance would be happening on the PCIe side, and even if it was we'd
> never get to know about it.

Oh, right, that makes sense. I was thinking about the PCIe side, but
if the controller itself isn't working, of course we wouldn't get that
far.

I would expect that the CPU itself would have some kind of timeout for
the read, but that's far outside of the PCI world.

Bjorn