Re: [PATCH 2/3] PCI: dwc: pci-dra7xx: Improve MSI IRQ handling

From: Vignesh R
Date: Wed Feb 14 2018 - 23:29:07 EST


Hi,

On Monday 12 February 2018 11:28 PM, Lorenzo Pieralisi wrote:
> On Fri, Feb 09, 2018 at 05:34:14PM +0530, Vignesh R wrote:
>> We need to ensure that there are no pending MSI IRQ vector set (i.e
>> PCIE_MSI_INTR0_STATUS reads 0 at least once) before exiting
>> dra7xx_pcie_msi_irq_handler(). Else, the dra7xx PCIe wrapper will not
>> register new MSI IRQs even though PCIE_MSI_INTR0_STATUS shows IRQs are
>> pending. Therefore, keep calling dra7xx_pcie_msi_irq_handler() until it
>> returns IRQ_NONE, which suggests that PCIE_MSI_INTR0_STATUS is 0.
>>
>> This fixes a bug, where PCIe wifi cards with 4 DMA queues like Intel
>> 8260 used to throw following error and stall during ping/iperf3 tests.
>>
>> [ÂÂ 97.776310] iwlwifi 0000:01:00.0: Queue 9 stuck for 2500 ms.
>>
>> Signed-off-by: Vignesh R <vigneshr@xxxxxx>
>> ---
>>Â drivers/pci/dwc/pci-dra7xx.c | 21 ++++++++++++++++++---
>>Â 1 file changed, 18 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/pci/dwc/pci-dra7xx.c b/drivers/pci/dwc/pci-dra7xx.c
>> index ed8558d638e5..3420cbf7b60a 100644
>> --- a/drivers/pci/dwc/pci-dra7xx.c
>> +++ b/drivers/pci/dwc/pci-dra7xx.c
>> @@ -254,14 +254,31 @@ static irqreturn_t dra7xx_pcie_msi_irq_handler(int irq, void *arg)
>>ÂÂÂÂÂÂÂ struct dra7xx_pcie *dra7xx = arg;
>>ÂÂÂÂÂÂÂ struct dw_pcie *pci = dra7xx->pci;
>>ÂÂÂÂÂÂÂ struct pcie_port *pp = &pci->pp;
>> +ÂÂÂÂ int count = 0;
>>ÂÂÂÂÂÂÂ unsigned long reg;
>>ÂÂÂÂÂÂÂ u32 virq, bit;
>
>>ÂÂÂÂÂÂÂ reg = dra7xx_pcie_readl(dra7xx, PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI);
>> +ÂÂÂÂ dra7xx_pcie_writel(dra7xx, PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI, reg);
>
>>ÂÂÂÂÂÂÂ switch (reg) {
>>ÂÂÂÂÂÂÂ case MSI:
>> -ÂÂÂÂÂÂÂÂÂÂÂÂ dw_handle_msi_irq(pp);
>> +ÂÂÂÂÂÂÂÂÂÂÂÂ /*
>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂ * Need to make sure no MSI IRQs are pending before
>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂ * exiting handler, else the wrapper will not catch new
>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂ * IRQs. So loop around till dw_handle_msi_irq() returns
>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂ * IRQ_NONE
>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂ */
>> +ÂÂÂÂÂÂÂÂÂÂÂÂ while (dw_handle_msi_irq(pp) != IRQ_NONE && count < 1000)
>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ count++;
>> +
>> +ÂÂÂÂÂÂÂÂÂÂÂÂ if (count == 1000) {
>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ dev_err(pci->dev, "too much work in msi irq\n");
>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ dra7xx_pcie_writel(dra7xx,
>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI,
>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ reg);
>> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ return IRQ_HANDLED;
>
> I am not merging any code patching this IRQ handling routine anymore
> unless you thoroughly explain to me how this CONF_IRQSTATUS_MSI register
> works (and how it is related to DW registers) and why this specific host
> controller needs handling that is not required by any other host
> controller relying on dw_handle_msi_irq().

Unlike other DW PCIe controllers, TI implementation has a wrapper on top
of DW core. This wrapper latches the DW core level MSI and legacy
interrupts and then propagates it to GIC.
PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI register is present in this TI
wrapper which aggregates all the MSI IRQs(PCIE_MSI_INTR0_STATUS) of DW
level. They are mapped on the MSI interrupt line of PCIe controller,
using a single status bit in the PCIECTRL_TI_CONF_IRQSTATUS_MSI register.

So, the irq handler, dra7xx_pcie_msi_irq_handler(), first needs to look
at PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI[4] to know that its MSI IRQ and
then call dw_handle_msi_irq() to handle individual MSI vectors.
Driver has to make sure there are no pending vectors in DW core MSI
status register before exiting handler. Otherwise next MSI IRQ will not
be latched by the wrapper.


>
> I suspect there is a code design flaw with the way this host handles
> IRQs and we are going to find it and fix it the way it should, not with
> any plaster like this patch.
>

I agree there has been some churn wrt this wrapper level IRQ handler.
But, that was because hardware documentation/TRM did not match
actual behavior and so it took some time to understand how the
hardware is working. I have extensively tested this series on multiple
problematic PCIe USB cards and PCIe WiFi cards over week long stress
tests. And also had some agreement with internal hardware designers.
Hardware documentations will also be updated.


> Lorenzo
>
>> +ÂÂÂÂÂÂÂÂÂÂÂÂ }
>>ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ break;
>>ÂÂÂÂÂÂÂ case INTA:
>>ÂÂÂÂÂÂÂ case INTB:
>> @@ -275,8 +292,6 @@ static irqreturn_t dra7xx_pcie_msi_irq_handler(int irq, void *arg)
>>ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ break;
>>ÂÂÂÂÂÂÂ }
>
>> -ÂÂÂÂ dra7xx_pcie_writel(dra7xx, PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI, reg);
>> -
>>ÂÂÂÂÂÂÂ return IRQ_HANDLED;
>>Â }
>
>> --
>> 2.16.1
>>

--
Regards
Vignesh