RE: [PATCH] PCI: dwc: Added link up check in map_bus of dw_child_pcie_ops

From: Z.q. Hou
Date: Mon Oct 12 2020 - 00:41:19 EST


Hi Kishon,

> -----Original Message-----
> From: Kishon Vijay Abraham I <kishon@xxxxxx>
> Sent: 2020年10月1日 21:32
> To: Rob Herring <robh@xxxxxxxxxx>
> Cc: Gustavo Pimentel <Gustavo.Pimentel@xxxxxxxxxxxx>; Z.q. Hou
> <zhiqiang.hou@xxxxxxx>; Lorenzo Pieralisi <lorenzo.pieralisi@xxxxxxx>;
> linux-kernel@xxxxxxxxxxxxxxx; PCI <linux-pci@xxxxxxxxxxxxxxx>; Bjorn
> Helgaas <bhelgaas@xxxxxxxxxx>; Michael Walle <michael@xxxxxxxx>; Ard
> Biesheuvel <ardb@xxxxxxxxxx>
> Subject: Re: [PATCH] PCI: dwc: Added link up check in map_bus of
> dw_child_pcie_ops
>
> Hi Rob,
>
> On 30/09/20 8:31 pm, Rob Herring wrote:
> > On Wed, Sep 30, 2020 at 8:22 AM Kishon Vijay Abraham I <kishon@xxxxxx>
> wrote:
> >>
> >> Hi,
> >>
> >> On 29/09/20 10:41 pm, Rob Herring wrote:
> >>> On Tue, Sep 29, 2020 at 10:24 AM Gustavo Pimentel
> >>> <Gustavo.Pimentel@xxxxxxxxxxxx> wrote:
> >>>>
> >>>> On Tue, Sep 29, 2020 at 5:5:41, Z.q. Hou <zhiqiang.hou@xxxxxxx>
> wrote:
> >>>>
> >>>>> Hi Lorenzo,
> >>>>>
> >>>>> Thanks a lot for your comments!
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: Lorenzo Pieralisi <lorenzo.pieralisi@xxxxxxx>
> >>>>>> Sent: 2020年9月28日 17:39
> >>>>>> To: Z.q. Hou <zhiqiang.hou@xxxxxxx>
> >>>>>> Cc: Rob Herring <robh@xxxxxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx;
> >>>>>> PCI <linux-pci@xxxxxxxxxxxxxxx>; Bjorn Helgaas
> >>>>>> <bhelgaas@xxxxxxxxxx>; Gustavo Pimentel
> >>>>>> <gustavo.pimentel@xxxxxxxxxxxx>; Michael Walle
> >>>>>> <michael@xxxxxxxx>; Ard Biesheuvel <ardb@xxxxxxxxxx>
> >>>>>> Subject: Re: [PATCH] PCI: dwc: Added link up check in map_bus of
> >>>>>> dw_child_pcie_ops
> >>>>>>
> >>>>>> On Thu, Sep 24, 2020 at 04:24:47AM +0000, Z.q. Hou wrote:
> >>>>>>> Hi Rob,
> >>>>>>>
> >>>>>>> Thanks a lot for your comments!
> >>>>>>>
> >>>>>>>> -----Original Message-----
> >>>>>>>> From: Rob Herring <robh@xxxxxxxxxx>
> >>>>>>>> Sent: 2020年9月18日 23:28
> >>>>>>>> To: Z.q. Hou <zhiqiang.hou@xxxxxxx>
> >>>>>>>> Cc: linux-kernel@xxxxxxxxxxxxxxx; PCI
> >>>>>>>> <linux-pci@xxxxxxxxxxxxxxx>; Lorenzo Pieralisi
> >>>>>>>> <lorenzo.pieralisi@xxxxxxx>; Bjorn Helgaas
> >>>>>>>> <bhelgaas@xxxxxxxxxx>; Gustavo Pimentel
> >>>>>>>> <gustavo.pimentel@xxxxxxxxxxxx>; Michael Walle
> >>>>>> <michael@xxxxxxxx>;
> >>>>>>>> Ard Biesheuvel <ardb@xxxxxxxxxx>
> >>>>>>>> Subject: Re: [PATCH] PCI: dwc: Added link up check in map_bus
> >>>>>>>> of dw_child_pcie_ops
> >>>>>>>>
> >>>>>>>> On Fri, Sep 18, 2020 at 5:02 AM Z.q. Hou
> <zhiqiang.hou@xxxxxxx>
> >>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>> Hi Rob,
> >>>>>>>>>
> >>>>>>>>> Thanks a lot for your comments!
> >>>>>>>>>
> >>>>>>>>>> -----Original Message-----
> >>>>>>>>>> From: Rob Herring <robh@xxxxxxxxxx>
> >>>>>>>>>> Sent: 2020年9月17日 4:29
> >>>>>>>>>> To: Z.q. Hou <zhiqiang.hou@xxxxxxx>
> >>>>>>>>>> Cc: linux-kernel@xxxxxxxxxxxxxxx; PCI
> >>>>>>>>>> <linux-pci@xxxxxxxxxxxxxxx>; Lorenzo Pieralisi
> >>>>>>>>>> <lorenzo.pieralisi@xxxxxxx>; Bjorn Helgaas
> >>>>>>>>>> <bhelgaas@xxxxxxxxxx>; Gustavo Pimentel
> >>>>>>>>>> <gustavo.pimentel@xxxxxxxxxxxx>; Michael Walle
> >>>>>>>> <michael@xxxxxxxx>;
> >>>>>>>>>> Ard Biesheuvel <ardb@xxxxxxxxxx>
> >>>>>>>>>> Subject: Re: [PATCH] PCI: dwc: Added link up check in map_bus
> >>>>>>>>>> of dw_child_pcie_ops
> >>>>>>>>>>
> >>>>>>>>>> On Tue, Sep 15, 2020 at 11:49 PM Zhiqiang Hou
> >>>>>>>> <Zhiqiang.Hou@xxxxxxx>
> >>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> From: Hou Zhiqiang <Zhiqiang.Hou@xxxxxxx>
> >>>>>>>>>>>
> >>>>>>>>>>> On NXP Layerscape platforms, it results in SError in the
> >>>>>>>>>>> enumeration of the PCIe controller, which is not connecting
> >>>>>>>>>>> with an Endpoint device. And it doesn't make sense to
> >>>>>>>>>>> enumerate the Endpoints when the PCIe link is down. So this
> >>>>>>>>>>> patch added the link up check to avoid to fire configuration
> >>>>>> transactions on link down bus.
> >>>>>>>>>>
> >>>>>>>>>> Michael reported the same issue as well.
> >>>>>>>>>>
> >>>>>>>>>> What happens if the link goes down between the check and the
> >>>>>> access?
> >>>>>>>>>
> >>>>>>>>> This patch cannot cover this case, and will get the SError.
> >>>>>>>>> But I think it makes sense to avoid firing transactions on link
> down bus.
> >>>>>>>>
> >>>>>>>> That's impossible to do without a race even in h/w.
> >>>>>>>
> >>>>>>> Agree.
> >>>>>>>
> >>>>>>>>
> >>>>>>>>>> It's a racy check. I'd like to find an alternative solution.
> >>>>>>>>>> It's even worse if Layerscape is used in ECAM mode. I looked
> >>>>>>>>>> at the EDK2 setup for layerscape[1] and it looks like root
> >>>>>>>>>> ports are just skipped if link
> >>>>>>>> is down.
> >>>>>>>>>> Maybe a link down just never happens once up, but if so, then
> >>>>>>>>>> we only need to check it once and fail probe.
> >>>>>>>>>
> >>>>>>>>> Many customers connect the FPGA Endpoint, which may
> establish
> >>>>>>>>> PCIe link after the PCIe enumeration and then rescan the PCIe
> >>>>>>>>> bus, so I think it should not exit the probe of root port even
> >>>>>>>>> if there is not link up
> >>>>>>>> during enumeration.
> >>>>>>>>
> >>>>>>>> That's a good reason. I want to unify the behavior here as it
> >>>>>>>> varies per platform currently and wasn't sure which way to go.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>> I've dug into this a bit more and am curious about the
> >>>>>>>>>> PCIE_ABSERR register setting which is set to:
> >>>>>>>>>>
> >>>>>>>>>> #define PCIE_ABSERR_SETTING 0x9401 /* Forward error of
> >>>>>>>>>> non-posted request */
> >>>>>>>>>>
> >>>>>>>>>> It seems to me this is not what we want at least for config
> >>>>>>>>>> accesses, but commit 84d897d6993 where this was added
> seems
> >>>>>>>>>> to say otherwise. Is it not possible to configure the
> >>>>>>>>>> response per access
> >>>>>> type?
> >>>>>>>>>
> >>>>>>>>> Thanks a lot for your investigation!
> >>>>>>>>> The story is like this: Some customers worry about these
> >>>>>>>>> silent error (DWC PCIe IP won't forward the error of outbound
> >>>>>>>>> non-post request by default), so we were pushed to enable the
> >>>>>>>>> error forwarding to AXI in the commit
> >>>>>>>>> 84d897d6993 as you saw. But it cannot differentiate the config
> >>>>>>>>> transactions from the MEM_rd, except the Vendor ID access,
> >>>>>>>>> which is controlled by a separate bit and it was set to not
> >>>>>>>>> forward error of access
> >>>>>>>> of Vendor ID.
> >>>>>>>>> So we think it's okay to enable the error forwarding, the
> >>>>>>>>> SError should not occur, because after the enumeration it
> >>>>>>>>> won't access the
> >>>>>>>> non-existent functions.
> >>>>>>>>
> >>>>>>>> We've rejected upstream support for platforms aborting on
> >>>>>>>> config accesses[1]. I think there's clear consensus that
> >>>>>>>> aborting is the wrong behavior.
> >>>>>>>>
> >>>>>>>> Do MEM_wr errors get forwarded? Seems like that would be
> enough.
> >>>>>>>> Also, wouldn't page faults catch most OOB accesses anyways?
> You
> >>>>>>>> need things page aligned anyways with an IOMMU and doing
> >>>>>>>> userspace access or guest assignment.
> >>>>>>>
> >>>>>>> Yes, errors of MEM_wr can be forwarded.
> >>>>>>>
> >>>>>>>>
> >>>>>>>> Here's another idea, how about only enabling forwarding errors
> >>>>>>>> if the link is up? If really would need to be configured any
> >>>>>>>> time the link state changes rather than just at probe. I'm not
> >>>>>>>> sure if you have a way to disable it on link down though.
> >>>>>>>
> >>>>>>> Dug deeper into this issue and found the setting of not
> >>>>>>> forwarding error of non-existent Vender ID access counts on the
> link partner: 1.
> >>>>>>> When there is a link partner (namely link up), it will return
> >>>>>>> 0xffff when read non-existent function Vendor ID and won't
> >>>>>>> forward error to AXI. 2. When no link partner (link down), it
> >>>>>>> will forward the error of reading non-existent function Vendor
> >>>>>>> ID to AXI and result in SError.
> >>>>>>>
> >>>>>>> I think this is a DWC PCIe IP specific issue but not get
> >>>>>>> feedback from design team. I'm thinking to disable this error
> >>>>>>> forwarding just like other platforms, since when these errors
> >>>>>>> (UR, CA and CT) are detected, AER driver can also report the error
> and try to recover.
> >>>>>>
> >>>>>> I take this as you shall send a patch to fix this issue shortly, is this
> correct ?
> >>>>>
> >>>>> The issue becomes complex:
> >>>>> I reviewed the DWC PCIe databook of verion 4.40a which is used on
> Layerscape platforms, and it said that " Your RC application should not
> generate CFG requests until it has confirmed that the link is up by sampling
> the smlh_link_up and rmlh_link_up outputs".
> >>>>> So, the link up checking should not be remove before each outbound
> CFG access.
> >>>>> Gustavo, can you share more details on the link up checking? Does it
> only exist in the 4.40a?
> >>>>
> >>>> Hi Zhiqiang,
> >>>>
> >>>> According to the information that I got from the IP team you are
> >>>> correct, the same requirement still exists on the newer IP versions.
> >>>
> >>> How is that possible in a race free way?
> >>>
> >>> Testing on meson and layerscape (with the forwarding of errors
> >>> disabled) shows a link check is not needed. But then dra7xx seems to
> >>> need one (or has some f/w setup).
> >>
> >> Yeah, I don't see any registers in the DRA7x PCIe wrapper for
> >> disabling error forwarding.
> >
> > It's a DWC port logic register AFAICT, but perhaps not present in all
> versions.
>
> Okay. I see there's a register PCIECTRL_PL_AXIS_SLV_ERR_RESP which has a
> reset value of 0.
>
> It has four bit-fields, RESET_TIMEOUT_ERR_MAP, NO_VID_ERR_MAP,
> DBI_ERR_MAP and SLAVE_ERR_MAP. I'm not seeing any difference in
> behavior if I set all these bits. Maybe it requires platform support too. I'll
> check this with our design team.

In DWC v4.40a databook, there is a bit AMBA_ERROR_RESPONSE_GLOBAL which controls if enable the error forwarding. The *MAP bits only determine which error (SLVERR or DECERR) will be forwarded to AXI/AHB bus.

Thanks,
Zhiqiang

>
> Meanwhile would it be okay to add linkup check atleast for DRA7X so that
> we could have it booting in linux-next?
>
> Thanks
> Kishon