Re: [PATCH v4] PCI: dwc: pci-dra7xx: Fix MSI IRQ handling

From: Bjorn Helgaas
Date: Tue Mar 31 2020 - 12:09:20 EST


On Mon, Mar 30, 2020 at 11:12:10PM +0200, Thomas Gleixner wrote:
> Bjorn Helgaas <helgaas@xxxxxxxxxx> writes:
> > On Fri, Mar 27, 2020 at 03:24:34PM +0530, Vignesh Raghavendra wrote:
> >> Due an issue with PCIe wrapper around DWC PCIe IP on dra7xx, driver
> >> needs to ensure that there are no pending MSI IRQ vector set (i.e
> >> PCIE_MSI_INTR0_STATUS reads 0 at least once) before exiting IRQ handler.
> >> Else, the dra7xx PCIe wrapper will not register new MSI IRQs even though
> >> PCIE_MSI_INTR0_STATUS shows IRQs are pending.
> >
> > I'm not an IRQ guy (real IRQ guys CC'd), but I'm wondering if this is
> > really a symptom of a problem in the generic DWC IRQ handling, not a
> > problem in dra7xx itself.
> >
> > I thought it was sort of standard behavior that a device would not
> > send a new MSI unless there was a transition from "no status bits set"
> > to "at least one status bit set". I'm looking at this text from the
> > PCIe r5.0 spec, sec 6.7.3.4:
>
> That's for the device side. But this is the host side and that consists
> of two components:
>
> 1) The actual PCIe host controller (DWC)
>
> 2) Some hardware wrapper around #1 to glue the host controller IP
> into the TI SoC.
>
> #1 contains a MSI message receiver unit. PCIE_MSI_INTR0_STATUS is part
> that.
>
> If there is a MSI message sent to the host then the bit which is
> corresponding to the sent message (vector) is set in the status
> register. If a bit is set in the status register then the host
> controller raises an interrupt at its output.
>
> Here, if I deciphered the above changelog correctly, comes the wrapper
> glue #2 into play, which seems to be involved in forwarding the host
> controller interrupt to the CPU's interrupt controller (GIC) and that
> forwarding mechanism seems to have some issue.

Sorry for muddying the waters, and thanks for clarifying it, Thomas.

This patch is on its way to v5.7, and I guess we'll worry about
whether the interrupt chip reimplementation is overkill later.

Bjorn