Re: [PATCH 2/5] iov: Reset resources to 0 if totalVFs increases after enabling ARI

From: Alexander Duyck
Date: Wed Oct 28 2015 - 14:32:21 EST


On 10/28/2015 09:37 AM, Bjorn Helgaas wrote:
Hi Alex,

On Tue, Oct 27, 2015 at 01:52:21PM -0700, Alexander Duyck wrote:
This patch forces us to reallocate VF BARs if the totalVFs value has
increased after enabling ARI. This normally shouldn't occur, however I
have seen some non-spec devices that shift between 7 and some value greater
than 7 based on the ARI value and we want to avoid triggering any issues
with such devices.

Can you include specifics about the devices? The value "7" is pretty
specific, so if we're going to include that level of detail, we should
have the actual device info to go with it.

I referenced 7 as that is the largest number of VFs a single function can support assuming a single function without ARI and without the ability to handle Type 1 configuration requests. The Intel fm10k driver has logic in it that does a check for ARI and if it is supported it reports via sysfs a totalVFs of 64, otherwise it limits the totalVFs reported to 7. However, I don't believe it exposes the limitation via the configuration space.

I guess the problem is:

- Device supports 7 TotalVFs with ARI disabled, >7 with ARI enabled
- Firmware leaves ARI disabled in SRIOV_CTRL
- Firmware computes size based on 7 VFs
- Firmware allocates space and programs BARs for 7 VFs
- Linux enables ARI, reads >7 TotalVFs
- Linux computes size based on >7 VFs
- Increased size may overlap other resources

Right?

Right. More than likely what will happen is that you will see overlap of the device on itself if it has multiple base address registers assigned to the VFs.

Fixes: 3aa71da412fe ("PCI: Enable SR-IOV ARI Capable Hierarchy before reading TotalVFs")
Signed-off-by: Alexander Duyck <aduyck@xxxxxxxxxxxx>
---
drivers/pci/iov.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 099050d78a39..238950412de0 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -393,7 +393,7 @@ static int sriov_init(struct pci_dev *dev, int pos)
int rc;
int nres;
u32 pgsz;
- u16 ctrl, total;
+ u16 ctrl, total, orig_total;
struct pci_sriov *iov;
struct resource *res;
struct pci_dev *pdev;
@@ -402,6 +402,7 @@ static int sriov_init(struct pci_dev *dev, int pos)
pci_pcie_type(dev) != PCI_EXP_TYPE_ENDPOINT)
return -ENODEV;

+ pci_read_config_word(dev, pos + PCI_SRIOV_TOTAL_VF, &orig_total);
pci_read_config_word(dev, pos + PCI_SRIOV_CTRL, &ctrl);
if (ctrl & PCI_SRIOV_CTRL_VFE) {
pci_write_config_word(dev, pos + PCI_SRIOV_CTRL, 0);
@@ -450,6 +451,14 @@ found:
}
iov->barsz[i] = resource_size(res);
res->end = res->start + resource_size(res) * total - 1;
+
+ /* force reallocation of BARs if total VFs increased */
+ if (orig_total < total) {
+ res->flags |= IORESOURCE_UNSET;
+ res->end -= res->start;
+ res->start = 0;
+ }

Two thoughts here:

1) Even if the required space increased, it's possible that firmware
placed the BAR somewhere where the extra space is available. In that
case, this forces reallocation unnecessarily.

I'd say it is possible, but not likely. From past experience I have seen BIOSes do some very dumb things when it comes to SR-IOV, assuming they even support it.

In addition many of the VF devices out there support more than one base address register per function. The Intel NICs for example have one for device registers and one for MSI-X registers. And most BIOSes usually pack one right after the other from what I have seen. So while there may be more space there what usually happens is that the MSI-X region will have to be relocated in order to make room for expanding the other base address register.

My last bit on all this is that VFs are meant to be assigned into guests. I would argue that for the sake of security we are much better off invalidating the VF base address registers and forcing a reallocation if there is even a risk of the VF base address register space overlapping with some other piece of host memory. We don't want to risk possibly exposing any bits of the host that we didn't intend on.

2) This *feels* like something the PCI core should be doing anyway,
even without any help here. Shouldn't we fail in pci_claim_resource()
and set IORESOURCE_UNSET there?

OK, and a third: re-reading TotalVF is (I think) completely
unnecessary per spec, so if we're going to do it we should probably
have a one-line comment about why the code is doing something that
appears unnecessary, and really should have a concrete example device
in the changelog.

I'm perfectly fine with us adding a comment about this being needed for out-of-spec devices on BIOSes that may not fully support SR-IOV. However, I wasn't the one that had the device that needed the workaround for totalVFs and so I cannot provide a concrete example myself.

I've added Ben to the Cc as he was the author for the original patch that introduced this possible issue. Perhaps he can provide us with more details on the hardware that needs this.

- Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/