[PATCH] vfio/pci: Virtualize Maximum Payload Size

From: Alex Williamson
Date: Tue Sep 19 2017 - 12:58:15 EST

With virtual PCI-Express chipsets, we now see userspace/guest drivers
trying to match the physical MPS setting to a virtual downstream port.
Of course a lone physical device surrounded by virtual interconnects
cannot make a correct decision for a proper MPS setting. Instead,
let's virtualize the MPS control register so that writes through to
hardware are disallowed. Userspace drivers like QEMU assume they can
write anything to the device and we'll filter out anything dangerous.
Since mismatched MPS can lead to AER and other faults, let's add it
to the kernel side rather than relying on userspace virtualization to
handle it.

Signed-off-by: Alex Williamson <alex.williamson@xxxxxxxxxx>

Do we have any reason to suspect that a userspace driver has any
dependencies on the physical MPS setting or is this only tuning the
protocol layer and it's transparent to the driver? Note that per the
PCI spec, a device supporting only 128B MPS can hardwire the control
register to 000b, but it doesn't seem PCIe compliant to hardwire it to
any given value, such as would be the appearance if we exposed this as
a read-only register rather than virtualizing it. QEMU would then be
responsible for virtualizing it, which makes coordinating the upgrade

drivers/vfio/pci/vfio_pci_config.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
index 5628fe114347..91335e6de88a 100644
--- a/drivers/vfio/pci/vfio_pci_config.c
+++ b/drivers/vfio/pci/vfio_pci_config.c
@@ -849,11 +849,13 @@ static int __init init_pci_cap_exp_perm(struct perm_bits *perm)

* Allow writes to device control fields, except devctl_phantom,
- * which could confuse IOMMU, and the ARI bit in devctl2, which
+ * which could confuse IOMMU, MPS, which can break communication
+ * with other physical devices, and the ARI bit in devctl2, which
* is set at probe time. FLR gets virtualized via our writefn.
p_setw(perm, PCI_EXP_DEVCTL,
return 0;