Re: [PATCH v3 3/8] PCI: brcmstb: Add Broadcom STB PCIe host controller driver

From: Bjorn Helgaas
Date: Fri Dec 15 2017 - 15:14:48 EST


On Fri, Dec 15, 2017 at 02:53:57PM -0500, Jim Quinlan wrote:
> On Thu, Dec 14, 2017 at 5:51 PM, Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
> > On Wed, Dec 13, 2017 at 06:53:53PM -0500, Jim Quinlan wrote:
> >> On Tue, Dec 12, 2017 at 5:16 PM, Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
> >> > On Tue, Nov 14, 2017 at 05:12:07PM -0500, Jim Quinlan wrote:
> >> >> This commit adds the basic Broadcom STB PCIe controller. Missing is
> >> >> the ability to process MSI and also handle dma-ranges for inbound
> >> >> memory accesses. These two functionalities are added in subsequent
> >> >> commits.
> >> >>
> >> >> The PCIe block contains an MDIO interface. This is a local interface
> >> >> only accessible by the PCIe controller. It cannot be used or shared
> >> >> by any other HW. As such, the small amount of code for this
> >> >> controller is included in this driver as there is little upside to put
> >> >> it elsewhere.
> >> ...
> >
> >> >> +static bool brcm_pcie_valid_device(struct brcm_pcie *pcie, struct pci_bus *bus,
> >> >> + int dev)
> >> >> +{
> >> >> + if (pci_is_root_bus(bus)) {
> >> >> + if (dev > 0)
> >> >> + return false;
> >> >> + } else {
> >> >> + /* If there is no link, then there is no device */
> >> >> + if (!brcm_pcie_link_up(pcie))
> >> >> + return false;
> >> >
> >> > This is racy, since the link can go down after you check but before
> >> > you do the config access. I assume your hardware can deal with a
> >> > config access that targets a link that is down?
> >>
> >> Yes, that can happen but there is really nothing that can be done if
> >> the link goes down in that vulnerability window. What do you suggest
> >> doing?
> >
> > Most hardware drops writes and returns ~0 on reads if the link is
> > down. I assume your hardware does something similar, and that should
> > be enough. You shouldn't have to check whether the link is up.
> Unfortunately our HW is quite unforgiving and effects a synchronous or
> asynchronous abort when doing a config access when the link or power
> has gone down on the EP. I will open a discussion with the PCIe HW
> folks regarding why our controller does not behave like "most
> hardware". Thanks, Jim

I mentioned in the other thread [1] that I think the best way to
handle this is to figure out how to deal with the abort cleanly.
Using a test like *_pcie_link_up() to try to avoid it is a 99%
solution that means we'll see rare failures that are very difficult to
reproduce and debug.

> > The hardware might report errors, e.g., via AER, if the link is down.
> > And we might not not handle those nicely. If we have issues there, we
> > should find out and fix them.

[1] https://lkml.kernel.org/r/20171214225821.GN30595@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx