Re: [PATCH 1/1] PCI: dwc: Use regular interrupt instead of chained

From: Bjorn Helgaas
Date: Thu Jun 29 2023 - 16:59:17 EST


On Thu, Jun 29, 2023 at 04:42:07PM -0400, Radu Rendec wrote:
> On Thu, 2023-06-29 at 14:57 -0500, Bjorn Helgaas wrote:
> > On Thu, Jun 29, 2023 at 02:30:19PM -0400, Radu Rendec wrote:
> > > The DesignWare PCIe host driver uses a chained interrupt to demultiplex
> > > the downstream MSI interrupts. On Qualcomm SA8540P Ride, enabling both
> > > pcie2a and pcie3a at the same time can create an interrupt storm where
> > > the parent interrupt fires continuously, even though reading the PCIe
> > > host registers doesn't identify any child MSI interrupt source. This
> > > effectively locks up CPU0, which spends all the time servicing these
> > > interrupts.
> > >
> > > This is a clear example of how bypassing the interrupt core by using
> > > chained interrupts can be very dangerous if the hardware misbehaves.
> > >
> > > Convert the driver to use a regular interrupt for the demultiplex
> > > handler. This allows the interrupt storm detector to detect the faulty
> > > interrupt and disable it, allowing the system to run normally.
> >
> > There are many other users of irq_set_chained_handler_and_data() in
> > drivers/pci/controller/.  Should they be similarly converted?  If not,
> > how do we decide which need to use irq_set_chained_handler_and_data()
> > and which do not?
>
> According to Thomas Gleixner, yes. Obviously I don't want to put words
> in his mouth, but I think that's the gist of what he said in a reply to
> an RFC patch that I sent a few weeks ago:
> https://lore.kernel.org/all/877csohcll.ffs@tglx/

Is it's a bug in pcie-designware-host.c, and it's also a bug in the
other callers, we should fix them all.

But you do imply that there's some DesignWare hardware issue involved,
too, so I guess it's possible the other drivers don't have an issue
and/or actually require the chained IRQs. That's why I asked how we
should decide.

> > > -static void dw_pcie_free_msi(struct dw_pcie_rp *pp)
> > > +static void __dw_pcie_free_msi(struct dw_pcie_rp *pp, u32 num_ctrls)
> > >  {
> > >         u32 ctrl;
> > >  
> > > -       for (ctrl = 0; ctrl < MAX_MSI_CTRLS; ctrl++) {
> > > +       for (ctrl = 0; ctrl < num_ctrls; ctrl++) {
> > >                 if (pp->msi_irq[ctrl] > 0)
> > > -                       irq_set_chained_handler_and_data(pp->msi_irq[ctrl],
> > > -                                                        NULL, NULL);
> > > +                       free_irq(pp->msi_irq[ctrl], pp);
> > >         }
> > >  
> > >         irq_domain_remove(pp->msi_domain);
> > >         irq_domain_remove(pp->irq_domain);
> > >  }
> > >  
> > > +#define dw_pcie_free_msi(pp) __dw_pcie_free_msi(pp, MAX_MSI_CTRLS)
> >
> > What is the benefit of the dw_pcie_free_msi() macro?
>
> It allows me to add the num_ctrls parameter to the corresponding
> function (now renamed to __dw_pcie_free_msi()) without forcing all the
> existing call sites to send MAX_MSI_CTRLS explicitly.
>
> I needed that extra parameter to avoid duplicating the tear down code
> on the (new) error path in dw_pcie_msi_init() - see below.
>
> > >  static void dw_pcie_msi_init(struct dw_pcie_rp *pp)
> > >  {
> > >         struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
> > > @@ -361,9 +353,16 @@ static int dw_pcie_msi_host_init(struct dw_pcie_rp *pp)
> > >                 return ret;
> > >  
> > >         for (ctrl = 0; ctrl < num_ctrls; ctrl++) {
> > > -               if (pp->msi_irq[ctrl] > 0)
> > > -                       irq_set_chained_handler_and_data(pp->msi_irq[ctrl],
> > > -                                                   dw_chained_msi_isr, pp);
> > > +               if (pp->msi_irq[ctrl] > 0) {
> > > +                       ret = request_irq(pp->msi_irq[ctrl], dw_pcie_msi_isr, 0,
> > > +                                         dev_name(dev), pp);
> > > +                       if (ret) {
> > > +                               dev_err(dev, "Failed to request irq %d: %d\n",
> > > +                                       pp->msi_irq[ctrl], ret);
> > > +                               __dw_pcie_free_msi(pp, ctrl);
>
> This is where I'm using the extra parameter. If we fail to request an
> interrupt, we need to free all the other interrupts that we have
> requested so far, to leave everything in a clean state. But we can't
> use MAX_MSI_CTRLS with __dw_pcie_free_msi() and rely on the check there
> because there may be extra interrupts that we haven't requested *yet*
> and we would attempt to free them.

Makes sense, thanks.

Bjorn