RE: [PATCH] PCI: layerscape: Change back to the default error response behavior

From: Z.q. Hou
Date: Mon Oct 12 2020 - 00:33:40 EST


Hi Rob and Kishon,

> -----Original Message-----
> From: Rob Herring <robh@xxxxxxxxxx>
> Sent: 2020年9月30日 23:08
> To: Kishon Vijay Abraham I <kishon@xxxxxx>
> Cc: Z.q. Hou <zhiqiang.hou@xxxxxxx>; PCI <linux-pci@xxxxxxxxxxxxxxx>;
> linux-kernel@xxxxxxxxxxxxxxx; linux-arm-kernel
> <linux-arm-kernel@xxxxxxxxxxxxxxxxxxx>; Lorenzo Pieralisi
> <lorenzo.pieralisi@xxxxxxx>; Bjorn Helgaas <bhelgaas@xxxxxxxxxx>; M.h.
> Lian <minghuan.lian@xxxxxxx>; Roy Zang <roy.zang@xxxxxxx>; Mingkai
> Hu <mingkai.hu@xxxxxxx>; Leo Li <leoyang.li@xxxxxxx>
> Subject: Re: [PATCH] PCI: layerscape: Change back to the default error
> response behavior
>
> On Wed, Sep 30, 2020 at 8:29 AM Kishon Vijay Abraham I <kishon@xxxxxx>
> wrote:
> >
> > Hi Hou,
> >
> > On 29/09/20 6:43 pm, Zhiqiang Hou wrote:
> > > From: Hou Zhiqiang <Zhiqiang.Hou@xxxxxxx>
> > >
> > > In the current error response behavior, it will send a SLVERR
> > > response to device's internal AXI slave system interface when the
> > > PCIe controller experiences an erroneous completion (UR, CA and CT)
> > > from an external completer for its outbound non-posted request,
> > > which will result in SError and crash the kernel directly.
> > > This patch change back it to the default behavior to increase the
> > > robustness of the kernel. In the default behavior, it always sends
> > > an OKAY response to the internal AXI slave interface when the
> > > controller gets these erroneous completions. And the AER driver will
> > > report and try to recover these errors.
> >
> > I don't think not forwarding any error interrupts is a good idea.
>
> Interrupts would be fine. Abort/SError is not. I think it is pretty clear what the
> correct behavior is for config accesses.

I agree with Rob.

>
> > Maybe
> > you could disable it while reading configuration space registers
> > (vendorID and deviceID) and then enable error forwarding back?
>
> To add to the locking (or lack of) problems in config accesses?

If take this approach, during the hole of CFG access, the error of MEM_rd will also not be forwarded, so it's not a reliable mechanism for user.

Thanks,
Zhiqiang

>
> Rob