Re: [PATCH v2] PCI: rockchip: Avoid accessing PCIe registers with clocks gated

From: Bjorn Helgaas
Date: Thu Jul 01 2021 - 09:59:53 EST


On Thu, Jul 01, 2021 at 12:09:58AM +0200, Javier Martinez Canillas wrote:
> On 6/30/21 10:30 PM, Bjorn Helgaas wrote:
> > On Wed, Jun 30, 2021 at 09:59:58PM +0200, Javier Martinez Canillas wrote:
>
> [snip]
>
> >> But maybe you can also add a paragraph that mentions the
> >> CONFIG_DEBUG_SHIRQ option and shared interrupts? That way, other
> >> driver authors could know that by enabling this an underlying
> >> problem might be exposed for them to fix.
> >
> > Good idea, thanks! I added this; is it something like what you
> > had in mind?
>
> Thanks a lot for doing this rewording. I just have a small nit for
> the text.
>
> > Found by enabling CONFIG_DEBUG_SHIRQ, which calls the IRQ
> > handler when it is being unregistered. An error during the
> > probe path might cause this unregistration and IRQ handler
> > execution before the device or data structure init has
> > finished.
>
> The IRQ handler is not called when unregistered, but it is called
> when another handler for the shared IRQ is unregistered. In this
> particular driver, both a "pcie-sys" and "pcie-client" handlers are
> registered, then an error leads to "pcie-sys" being unregistered and
> the handler for "pcie-client" being called.

Is this really true? I think that would mean CONFIG_DEBUG_SHIRQ would
not find this kind of bug unless we actually registered two or more
handlers for the shared IRQ, but it's still a bug even only one
handler is registered.

Looking at __free_irq() [1], my impression is that "action" is what
we're removing and action->handler() is the IRQ handler we call when
CONFIG_DEBUG_SHIRQ, so it doesn't look like it's calling the remaining
handlers after removing one of them.

> So maybe the following instead?
>
> Found by enabling CONFIG_DEBUG_SHIRQ, which calls the IRQ
> handlers when a handler for the shared IRQ is unregistered. An
> error during the probe path might cause this unregistration and
> handler execution before the device or data structure init has
> finished.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/irq/manage.c?id=v5.13#n1805