Re: [PATCH v2] PCI: rockchip: Avoid accessing PCIe registers with clocks gated

From: Bjorn Helgaas
Date: Wed Jun 30 2021 - 16:30:35 EST


On Wed, Jun 30, 2021 at 09:59:58PM +0200, Javier Martinez Canillas wrote:
> On 6/30/21 8:59 PM, Bjorn Helgaas wrote:
> > [+cc Michal, Jingoo, Thierry, Jonathan]
>
> [snip]
>
> >
> > I think the above commit log is perfectly accurate, but all the
> > details might suggest that this is something specific to rockchip or
> > CONFIG_DEBUG_SHIRQ, which it isn't, and they might obscure the
> > fundamental problem, which is actually very simple: we registered IRQ
> > handlers before we were ready for them to be called.
> >
> > I propose the following commit log in the hope that it would help
> > other driver authors to make similar fixes:
> >
> > PCI: rockchip: Register IRQ handlers after device and data are ready
> >
> > An IRQ handler may be called at any time after it is registered, so
> > anything it relies on must be ready before registration.
> >
> > rockchip_pcie_subsys_irq_handler() and rockchip_pcie_client_irq_handler()
> > read registers in the PCIe controller, but we registered them before
> > turning on clocks to the controller. If either is called before the clocks
> > are turned on, the register reads fail and the machine hangs.
> >
> > Similarly, rockchip_pcie_legacy_int_handler() uses rockchip->irq_domain,
> > but we installed it before initializing irq_domain.
> >
> > Register IRQ handlers after their data structures are initialized and
> > clocks are enabled.
> >
> > If this is inaccurate or omits something important, let me know. I
> > can make any updates locally.
> >
>
> I think your description is accurate and agree that the commit message may
> be misleading. As you said, this is a general problem and the fact that an
> IRQ is shared and CONFIG_DEBUG_SHIRQ fires a spurious interrupt just make
> the assumptions in the driver to fall apart.
>
> But maybe you can also add a paragraph that mentions the CONFIG_DEBUG_SHIRQ
> option and shared interrupts? That way, other driver authors could know that
> by enabling this an underlying problem might be exposed for them to fix.

Good idea, thanks! I added this; is it something like what you had in
mind?

Found by enabling CONFIG_DEBUG_SHIRQ, which calls the IRQ handler when it
is being unregistered. An error during the probe path might cause this
unregistration and IRQ handler execution before the device or data
structure init has finished.

Bjorn