Re: [PATCH v5 1/2] PCI: iproc: Retry request when CRS returned from EP

From: Bjorn Helgaas
Date: Thu Aug 03 2017 - 14:57:22 EST


On Thu, Aug 03, 2017 at 01:50:29PM +0530, Oza Oza wrote:
> On Thu, Aug 3, 2017 at 2:34 AM, Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
> > On Thu, Jul 06, 2017 at 08:39:41AM +0530, Oza Pawandeep wrote:
> >> For Configuration Requests only, following reset it is possible for a
> >> device to terminate the request but indicate that it is temporarily unable
> >> to process the Request, but will be able to process the Request in the
> >> future â in this case, the Configuration Request Retry Status 100 (CRS)
> >> Completion Status is used.
> >
> > Please include the spec reference for this.
> >
> > The "100" looks like it's supposed to be the Completion Status Field
> > value for CRS from PCIe r3.1, sec 2.2.9, Table 2-34, but the value in
> > the table is actually 0b010, not 0b100. I don't think this level of
> > detail is necessary unless your hardware exposes those values to the
> > driver.
> >
>
> I will remove the above description from the comment.
>
> >> As per PCI spec, CRS Software Visibility only affects config read of the
> >> Vendor ID, for config write or any other config read the Root must
> >> automatically re-issue configuration request again as a new request.
> >> Iproc based PCIe RC (hw) does not retry request on its own.
> >
> > I think sec 2.3.2 is the relevant part of the spec. It basically
> > says that when an RC receives a CRS completion for a config request:
> >
> > - If CRS software visibility is not enabled, the RC must reissue the
> > config request as a new request.
> >
> > - If CRS software visibility is enabled,
> > - for a config read of Vendor ID, the RC must return 0x0001 data
> > - for all other config reads/writes, the RC must reissue the
> > request
> >
>
> yes, above is more relevant, and I will include it in the description.
>
> > Apparently iProc *never* reissues config requests, regardless of
> > whether CRS software visibility is enabled?
> >
>
> that is true.
>
> > But your CFG_RETRY_STATUS definition suggests that iProc always
> > fabricates config read data of 0xffff0001 when it sees CRS status, no
> > matter whether software visibility is enabled and no matter what
> > config address we read?
>
> yes, that is precisely the case.

Not according to the documentation you quoted below.

> > What about CRS status for a config *write*? There's nothing here to
> > reissue those.
>
> No, we do not need there, because read will always be issued first
> before any write.
> so we do not need to implement write.

How so? As far as I know, there's nothing in the spec that requires
the first config access to a device to be a read, and there are
reasons why we might want to do a write first:
http://lkml.kernel.org/r/5952D144.8060609@xxxxxxxxxx

> > Is there a hardware erratum that describes the actual hardware
> > behavior?
>
> this is documented in our PCIe core hw manual.
> >>
> 4.7.3.3. Retry Status On Configuration Cycle
> Endpoints are allowed to generate retry status on configuration
> cycles. In this case, the RC needs to re-issue the request. The IP
> does not handle this because the number of configuration cycles needed
> will probably be less
> than the total number of non-posted operations needed. When a retry status
> is received on the User RX interface for a configuration request that
> was sent on the User TX interface, it will be indicated with a
> completion with the CMPL_STATUS field set to 2=CRS, and the user will
> have to find the address and data values and send a new transaction on
> the User TX interface.
> When the internal configuration space returns a retry status during a
> configuration cycle (user_cscfg = 1) on the Command/Status interface,
> the pcie_cscrs will assert with the pcie_csack signal to indicate the
> CRS status.
> When the CRS Software Visibility Enable register in the Root Control
> register is enabled, the IP will return the data value to 0x0001 for
> the Vendor ID value and 0xffff (all 1âs) for the rest of the data in
> the request for reads of offset 0 that return with CRS status. This
> is true for both the User RX Interface and for the Command/Status
> interface. When CRS Software Visibility is enabled, the CMPL_STATUS
> field of the completion on the User RX Interface will not be 2=CRS and
> the pcie_cscrs signal will not assert on the Command/Status interface.
> >>
> Broadcom does not sell PCIe core IP, so above information is not
> publicly available in terms of hardware erratum or any similar note.
>
>
> >
> >> As a result of the fact, PCIe RC driver (sw) should take care of CRS.
> >> This patch fixes the problem, and attempts to read config space again in
> >> case of PCIe code forwarding the CRS back to CPU.
> >> It implements iproc_pcie_config_read which gets called for Stingray,
> >> Otherwise it falls back to PCI generic APIs.
> >>
> >> Signed-off-by: Oza Pawandeep <oza.oza@xxxxxxxxxxxx>
> >> Reviewed-by: Ray Jui <ray.jui@xxxxxxxxxxxx>
> >> Reviewed-by: Scott Branden <scott.branden@xxxxxxxxxxxx>
> >>
> >> diff --git a/drivers/pci/host/pcie-iproc.c b/drivers/pci/host/pcie-iproc.c
> >> index 0f39bd2..b0abcd7 100644
> >> --- a/drivers/pci/host/pcie-iproc.c
> >> +++ b/drivers/pci/host/pcie-iproc.c
> >> @@ -68,6 +68,9 @@
> >> #define APB_ERR_EN_SHIFT 0
> >> #define APB_ERR_EN BIT(APB_ERR_EN_SHIFT)
> >>
> >> +#define CFG_RETRY_STATUS 0xffff0001
> >> +#define CFG_RETRY_STATUS_TIMEOUT_US 500000 /* 500 milli-seconds. */
> >> +
> >> /* derive the enum index of the outbound/inbound mapping registers */
> >> #define MAP_REG(base_reg, index) ((base_reg) + (index) * 2)
> >>
> >> @@ -448,6 +451,55 @@ static inline void iproc_pcie_apb_err_disable(struct pci_bus *bus,
> >> }
> >> }
> >>
> >> +static int iproc_pcie_cfg_retry(void __iomem *cfg_data_p)
> >> +{
> >> + int timeout = CFG_RETRY_STATUS_TIMEOUT_US;
> >> + unsigned int ret;
> >> +
> >> + /*
> >> + * As per PCI spec, CRS Software Visibility only
> >> + * affects config read of the Vendor ID.
> >> + * For config write or any other config read the Root must
> >> + * automatically re-issue configuration request again as a
> >> + * new request. Iproc based PCIe RC (hw) does not retry
> >> + * request on its own, so handle it here.
> >> + */
> >> + do {
> >> + ret = readl(cfg_data_p);
> >> + if (ret == CFG_RETRY_STATUS)
> >> + udelay(1);
> >> + else
> >> + return PCIBIOS_SUCCESSFUL;
> >> + } while (timeout--);
> >
> > Shouldn't this check *where* in config space we're reading?
> >
> No, we do not require, because with SPDK it was reading BAR space
> which points to MSI-x table.
> what I mean is, it could be anywhere.

The documentation above says the IP returns data of 0x0001 for *reads
of offset 0*. Your current code reissues the read if *any* read
anywhere returns data that happens to match CFG_RETRY_STATUS. That
may be a perfectly valid register value for some device's config
space. In that case, you do not want to reissue the read and you do
not want to timeout.

> > Shouldn't we check whether CRS software visiblity is enabled? Does
> > iProc advertise CRS software visibility support in Root Capabilities?
>
> No, this is also not required because irrespective of that, IP will
> re-issue the request, and it will forwards error code to host bridge.
> and in-turn host-bridge is implemented to return 0xffff0001 as code.

The documentation says the IP does not handle reissuing the request.

> > If CRS software visibility is enabled, we should return the 0x0001
> > data if we're reading the Vendor ID and see CRS status. It looks like
> > this loop always retries if we see 0xffff0001 data.
> >
> our internal host bridge is implemented this way, and it returns 0xffff0001.
> so there, we have no choice.

The PCIe spec says that if CRS software visibility is enabled and we
get a CRS completion for a config read of the Vendor ID at offset 0,
the read should not be retried, and the read should return 0x0001.
For example, pci_bus_read_dev_vendor_id() should see that 0x0001
value.

That's not what this code does. If the readl() returns
CFG_RETRY_STATUS, you do the delay and reissue the read. You never
return 0x0001 to the caller of pci_bus_read_config_word().

> As I understand from your comments that, I have to change comment
> description and include right PCI spec reference with section.
> apart form that I have tried to answer your IP specific details.

Please include the documentation quote above in the changelog when you
revise this patch.

> >> +
> >> + return PCIBIOS_DEVICE_NOT_FOUND;
> >> +}
> >> +
> >> +static void __iomem *iproc_pcie_map_ep_cfg_reg(struct iproc_pcie *pcie,
> >> + unsigned int busno,
> >> + unsigned int slot,
> >> + unsigned int fn,
> >> + int where)
> >> +{
> >> + u16 offset;
> >> + u32 val;
> >> +
> >> + /* EP device access */
> >> + val = (busno << CFG_ADDR_BUS_NUM_SHIFT) |
> >> + (slot << CFG_ADDR_DEV_NUM_SHIFT) |
> >> + (fn << CFG_ADDR_FUNC_NUM_SHIFT) |
> >> + (where & CFG_ADDR_REG_NUM_MASK) |
> >> + (1 & CFG_ADDR_CFG_TYPE_MASK);
> >> +
> >> + iproc_pcie_write_reg(pcie, IPROC_PCIE_CFG_ADDR, val);
> >> + offset = iproc_pcie_reg_offset(pcie, IPROC_PCIE_CFG_DATA);
> >> +
> >> + if (iproc_pcie_reg_is_invalid(offset))
> >> + return NULL;
> >> +
> >> + return (pcie->base + offset);
> >> +}
> >> +
> >> /**
> >> * Note access to the configuration registers are protected at the higher layer
> >> * by 'pci_lock' in drivers/pci/access.c
> >> @@ -499,13 +551,48 @@ static void __iomem *iproc_pcie_map_cfg_bus(struct pci_bus *bus,
> >> return (pcie->base + offset);
> >> }
> >>
> >> +static int iproc_pcie_config_read(struct pci_bus *bus, unsigned int devfn,
> >> + int where, int size, u32 *val)
> >> +{
> >> + struct iproc_pcie *pcie = iproc_data(bus);
> >> + unsigned int slot = PCI_SLOT(devfn);
> >> + unsigned int fn = PCI_FUNC(devfn);
> >> + unsigned int busno = bus->number;
> >> + void __iomem *cfg_data_p;
> >> + int ret;
> >> +
> >> + /* root complex access. */
> >> + if (busno == 0)
> >> + return pci_generic_config_read32(bus, devfn, where, size, val);
> >> +
> >> + cfg_data_p = iproc_pcie_map_ep_cfg_reg(pcie, busno, slot, fn, where);
> >> +
> >> + if (!cfg_data_p)
> >> + return PCIBIOS_DEVICE_NOT_FOUND;
> >> +
> >> + ret = iproc_pcie_cfg_retry(cfg_data_p);
> >> + if (ret)
> >> + return ret;
> >> +
> >> + *val = readl(cfg_data_p);
> >> +
> >> + if (size <= 2)
> >> + *val = (*val >> (8 * (where & 3))) & ((1 << (size * 8)) - 1);
> >> +
> >> + return PCIBIOS_SUCCESSFUL;
> >> +}
> >> +
> >> static int iproc_pcie_config_read32(struct pci_bus *bus, unsigned int devfn,
> >> int where, int size, u32 *val)
> >> {
> >> int ret;
> >> + struct iproc_pcie *pcie = iproc_data(bus);
> >>
> >> iproc_pcie_apb_err_disable(bus, true);
> >> - ret = pci_generic_config_read32(bus, devfn, where, size, val);
> >> + if (pcie->type == IPROC_PCIE_PAXB_V2)
> >> + ret = iproc_pcie_config_read(bus, devfn, where, size, val);
> >> + else
> >> + ret = pci_generic_config_read32(bus, devfn, where, size, val);
> >> iproc_pcie_apb_err_disable(bus, false);
> >>
> >> return ret;
> >> --
> >> 1.9.1
> >>