RE: [PATCH] PCI: layerscape: Change back to the default error response behavior
From: Z.q. Hou
Date: Wed Sep 30 2020 - 01:37:35 EST
Hi Bjorn,
Thanks a lot for your comments!
> -----Original Message-----
> From: Bjorn Helgaas <helgaas@xxxxxxxxxx>
> Sent: 2020年9月29日 23:03
> To: Z.q. Hou <zhiqiang.hou@xxxxxxx>
> Cc: linux-pci@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> linux-arm-kernel@xxxxxxxxxxxxxxxxxxx; lorenzo.pieralisi@xxxxxxx;
> robh@xxxxxxxxxx; bhelgaas@xxxxxxxxxx; M.h. Lian
> <minghuan.lian@xxxxxxx>; Roy Zang <roy.zang@xxxxxxx>; Mingkai Hu
> <mingkai.hu@xxxxxxx>; Leo Li <leoyang.li@xxxxxxx>
> Subject: Re: [PATCH] PCI: layerscape: Change back to the default error
> response behavior
>
> On Tue, Sep 29, 2020 at 09:13:28PM +0800, Zhiqiang Hou wrote:
> > From: Hou Zhiqiang <Zhiqiang.Hou@xxxxxxx>
> >
> > In the current error response behavior, it will send a SLVERR response
> > to device's internal AXI slave system interface when the PCIe
> > controller experiences an erroneous completion (UR, CA and CT) from an
> > external completer for its outbound non-posted request, which will
> > result in SError and crash the kernel directly.
>
> Possible wording:
>
> As currently configured, when the PCIe controller receives a
> Completion with UR or CA status, or a Completion Timeout occurs, it
> sends a SLVERR response to the internal AXI slave system interface,
> which results in SError and a kernel crash.
>
> Please add a blank line between paragraphs, and s/This patch change back
> it/Change it/ below.
>
> > This patch change back it to the default behavior to increase the
> > robustness of the kernel. In the default behavior, it always sends an
> > OKAY response to the internal AXI slave interface when the controller
> > gets these erroneous completions. And the AER driver will report and
> > try to recover these errors.
>
> This reverts 84d897d69938 ("PCI: layerscape: Change default error response
> behavior"), so please mention that in the commit log, probably as:
>
> Fixes: 84d897d69938 ("PCI: layerscape: Change default error response
> behavior")
>
> Maybe it also needs a stable tag, e.g., v4.15+?
Thanks for your good suggestions! Will fix in v2.
>
> Since this is a pure revert, whatever problem 84d897d69938 fixed must now
> be fixed in some other way. Otherwise, this revert would just be
> reintroducing the problem fixed by 84d897d69938.
>
> This commit log should mention that what that other fix is.
>
> AER is only a reporting mechanism, it is asynchronous to the instruction
> stream, and it's optional (may not be implemented in the hardware, and may
> not be supported by the kernel), so I'm not super convinced that it can be the
> answer to this problem.
>
The commit 84d897d69938 ("PCI: layerscape: Change default error response behavior") doesn't fix any issue, it just enable a feature of DesignWare PCIe IP that it allows error response to AXI slave interface, which are not enabled on all other platforms with DWC IP. As mentioned in that commit it will also send an OKAY response to AXI slave interface for erroneous completion of non-post transaction including CFG and MEM_rd transactions, however upstream won't support for platforms aborting on CFG accesses, so we have to change it back to the default error response behavior and bear the error of MEM_rd isn't forwarded, just like other DWC IP platforms.
I remember the SError interrupt mechanism is also asynchronous abort and it is only a reporting mechanism. Contrast with the AER, it will make the kernel crash. So both of these 2 mechanism cannot ensure the data integrity, generally the upper layer data transfer protocol has its own mechanism to ensure the data integrity, it's not a issue for almost users. If one really wants a kernel crash when there is error of MEM_rd, he can enable this in his local code.
Thanks,
Zhiqiang
> > Signed-off-by: Hou Zhiqiang <Zhiqiang.Hou@xxxxxxx>
> > ---
> > drivers/pci/controller/dwc/pci-layerscape.c | 11 -----------
> > 1 file changed, 11 deletions(-)
> >
> > diff --git a/drivers/pci/controller/dwc/pci-layerscape.c
> > b/drivers/pci/controller/dwc/pci-layerscape.c
> > index f24f79a70d9a..e92ab8a77046 100644
> > --- a/drivers/pci/controller/dwc/pci-layerscape.c
> > +++ b/drivers/pci/controller/dwc/pci-layerscape.c
> > @@ -30,8 +30,6 @@
> >
> > /* PEX Internal Configuration Registers */
> > #define PCIE_STRFMR1 0x71c /* Symbol Timer & Filter Mask
> Register1 */
> > -#define PCIE_ABSERR 0x8d0 /* Bridge Slave Error Response
> Register */
> > -#define PCIE_ABSERR_SETTING 0x9401 /* Forward error of
> non-posted request */
> >
> > #define PCIE_IATU_NUM 6
> >
> > @@ -123,14 +121,6 @@ static int ls_pcie_link_up(struct dw_pcie *pci)
> > return 1;
> > }
> >
> > -/* Forward error response of outbound non-posted requests */ -static
> > void ls_pcie_fix_error_response(struct ls_pcie *pcie) -{
> > - struct dw_pcie *pci = pcie->pci;
> > -
> > - iowrite32(PCIE_ABSERR_SETTING, pci->dbi_base + PCIE_ABSERR);
> > -}
> > -
> > static int ls_pcie_host_init(struct pcie_port *pp) {
> > struct dw_pcie *pci = to_dw_pcie_from_pp(pp); @@ -142,7 +132,6 @@
> > static int ls_pcie_host_init(struct pcie_port *pp)
> > * dw_pcie_setup_rc() will reconfigure the outbound windows.
> > */
> > ls_pcie_disable_outbound_atus(pcie);
> > - ls_pcie_fix_error_response(pcie);
> >
> > dw_pcie_dbi_ro_wr_en(pci);
> > ls_pcie_clear_multifunction(pcie);
> > --
> > 2.17.1
> >