Re: [PATCH 2/5] iov: Reset resources to 0 if totalVFs increases after enabling ARI

From: Bjorn Helgaas
Date: Wed Oct 28 2015 - 15:53:10 EST


On Wed, Oct 28, 2015 at 11:32:14AM -0700, Alexander Duyck wrote:
> On 10/28/2015 09:37 AM, Bjorn Helgaas wrote:
> >Hi Alex,
> >
> >On Tue, Oct 27, 2015 at 01:52:21PM -0700, Alexander Duyck wrote:
> >>This patch forces us to reallocate VF BARs if the totalVFs value has
> >>increased after enabling ARI. This normally shouldn't occur, however I
> >>have seen some non-spec devices that shift between 7 and some value greater
> >>than 7 based on the ARI value and we want to avoid triggering any issues
> >>with such devices.
> >
> >Can you include specifics about the devices? The value "7" is pretty
> >specific, so if we're going to include that level of detail, we should
> >have the actual device info to go with it.
>
> I referenced 7 as that is the largest number of VFs a single
> function can support assuming a single function without ARI and
> without the ability to handle Type 1 configuration requests. The
> Intel fm10k driver has logic in it that does a check for ARI and if
> it is supported it reports via sysfs a totalVFs of 64, otherwise it
> limits the totalVFs reported to 7. However, I don't believe it
> exposes the limitation via the configuration space.

Ah, OK, that makes sense.

> >I guess the problem is:
> >
> > - Device supports 7 TotalVFs with ARI disabled, >7 with ARI enabled
> > - Firmware leaves ARI disabled in SRIOV_CTRL
> > - Firmware computes size based on 7 VFs
> > - Firmware allocates space and programs BARs for 7 VFs
> > - Linux enables ARI, reads >7 TotalVFs
> > - Linux computes size based on >7 VFs
> > - Increased size may overlap other resources
> >
> >Right?
>
> Right. More than likely what will happen is that you will see
> overlap of the device on itself if it has multiple base address
> registers assigned to the VFs.
>
> >>Fixes: 3aa71da412fe ("PCI: Enable SR-IOV ARI Capable Hierarchy before reading TotalVFs")
> >>Signed-off-by: Alexander Duyck <aduyck@xxxxxxxxxxxx>
> >>---
> >> drivers/pci/iov.c | 11 ++++++++++-
> >> 1 file changed, 10 insertions(+), 1 deletion(-)
> >>
> >>diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
> >>index 099050d78a39..238950412de0 100644
> >>--- a/drivers/pci/iov.c
> >>+++ b/drivers/pci/iov.c
> >>@@ -393,7 +393,7 @@ static int sriov_init(struct pci_dev *dev, int pos)
> >> int rc;
> >> int nres;
> >> u32 pgsz;
> >>- u16 ctrl, total;
> >>+ u16 ctrl, total, orig_total;
> >> struct pci_sriov *iov;
> >> struct resource *res;
> >> struct pci_dev *pdev;
> >>@@ -402,6 +402,7 @@ static int sriov_init(struct pci_dev *dev, int pos)
> >> pci_pcie_type(dev) != PCI_EXP_TYPE_ENDPOINT)
> >> return -ENODEV;
> >>
> >>+ pci_read_config_word(dev, pos + PCI_SRIOV_TOTAL_VF, &orig_total);
> >> pci_read_config_word(dev, pos + PCI_SRIOV_CTRL, &ctrl);
> >> if (ctrl & PCI_SRIOV_CTRL_VFE) {
> >> pci_write_config_word(dev, pos + PCI_SRIOV_CTRL, 0);
> >>@@ -450,6 +451,14 @@ found:
> >> }
> >> iov->barsz[i] = resource_size(res);
> >> res->end = res->start + resource_size(res) * total - 1;
> >>+
> >>+ /* force reallocation of BARs if total VFs increased */
> >>+ if (orig_total < total) {
> >>+ res->flags |= IORESOURCE_UNSET;
> >>+ res->end -= res->start;
> >>+ res->start = 0;
> >>+ }
> >
> >Two thoughts here:
> >
> >1) Even if the required space increased, it's possible that firmware
> >placed the BAR somewhere where the extra space is available. In that
> >case, this forces reallocation unnecessarily.
>
> I'd say it is possible, but not likely. From past experience I have
> seen BIOSes do some very dumb things when it comes to SR-IOV,
> assuming they even support it.
>
> In addition many of the VF devices out there support more than one
> base address register per function. The Intel NICs for example have
> one for device registers and one for MSI-X registers. And most
> BIOSes usually pack one right after the other from what I have seen.
> So while there may be more space there what usually happens is that
> the MSI-X region will have to be relocated in order to make room for
> expanding the other base address register.
>
> My last bit on all this is that VFs are meant to be assigned into
> guests. I would argue that for the sake of security we are much
> better off invalidating the VF base address registers and forcing a
> reallocation if there is even a risk of the VF base address register
> space overlapping with some other piece of host memory. We don't
> want to risk possibly exposing any bits of the host that we didn't
> intend on.

Agreed, not likely for several reasons.

> >2) This *feels* like something the PCI core should be doing anyway,
> >even without any help here. Shouldn't we fail in pci_claim_resource()
> >and set IORESOURCE_UNSET there?

This is really the core of my question -- what problem does this patch
solve? I'm trying to figure out if delaying the read of TotalVFs
until after we set ARI Capable Hierarchy is sufficient, and if it's
not sufficient, *why* not?

> >OK, and a third: re-reading TotalVF is (I think) completely
> >unnecessary per spec, so if we're going to do it we should probably
> >have a one-line comment about why the code is doing something that
> >appears unnecessary, and really should have a concrete example device
> >in the changelog.
>
> I'm perfectly fine with us adding a comment about this being needed
> for out-of-spec devices on BIOSes that may not fully support SR-IOV.
> However, I wasn't the one that had the device that needed the
> workaround for totalVFs and so I cannot provide a concrete example
> myself.
>
> I've added Ben to the Cc as he was the author for the original patch
> that introduced this possible issue. Perhaps he can provide us with
> more details on the hardware that needs this.
>
> - Alex
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/