Re: [PATCH] vfio/pci: Virtualize Maximum Payload Size

From: Auger Eric
Date: Tue Sep 19 2017 - 13:50:47 EST


Hi Alex,

On 19/09/2017 18:58, Alex Williamson wrote:
> With virtual PCI-Express chipsets, we now see userspace/guest drivers
> trying to match the physical MPS setting to a virtual downstream port.
> Of course a lone physical device surrounded by virtual interconnects
> cannot make a correct decision for a proper MPS setting. Instead,
> let's virtualize the MPS control register so that writes through to
> hardware are disallowed. Userspace drivers like QEMU assume they can
> write anything to the device and we'll filter out anything dangerous.
> Since mismatched MPS can lead to AER and other faults, let's add it
> to the kernel side rather than relying on userspace virtualization to
> handle it.
>
> Signed-off-by: Alex Williamson <alex.williamson@xxxxxxxxxx>
> ---
>
> Do we have any reason to suspect that a userspace driver has any
> dependencies on the physical MPS setting or is this only tuning the
> protocol layer and it's transparent to the driver? Note that per the
> PCI spec, a device supporting only 128B MPS can hardwire the control
> register to 000b, but it doesn't seem PCIe compliant to hardwire it to
> any given value, such as would be the appearance if we exposed this as
> a read-only register rather than virtualizing it. QEMU would then be
> responsible for virtualizing it, which makes coordinating the upgrade
> troublesome.
>
> drivers/vfio/pci/vfio_pci_config.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
> index 5628fe114347..91335e6de88a 100644
> --- a/drivers/vfio/pci/vfio_pci_config.c
> +++ b/drivers/vfio/pci/vfio_pci_config.c
> @@ -849,11 +849,13 @@ static int __init init_pci_cap_exp_perm(struct perm_bits *perm)
>
> /*
> * Allow writes to device control fields, except devctl_phantom,
> - * which could confuse IOMMU, and the ARI bit in devctl2, which
> + * which could confuse IOMMU, MPS, which can break communication
> + * with other physical devices, and the ARI bit in devctl2, which
> * is set at probe time. FLR gets virtualized via our writefn.
> */
> p_setw(perm, PCI_EXP_DEVCTL,
> - PCI_EXP_DEVCTL_BCR_FLR, ~PCI_EXP_DEVCTL_PHANTOM);
> + PCI_EXP_DEVCTL_BCR_FLR | PCI_EXP_DEVCTL_PAYLOAD,
> + ~PCI_EXP_DEVCTL_PHANTOM);
> p_setw(perm, PCI_EXP_DEVCTL2, NO_VIRT, ~PCI_EXP_DEVCTL2_ARI);
Is it correct that the read value still will be the one written by the
guest?

I see the MMRS can take the read MPS value in some pcie_bus_config
values. So a consequence could be that the applied MMRS (which is not
virtualized) is lower than what is set by host, due to a guest pcie root
port MPSS for instance.

So if the above is not totally wrong, shouldn't we virtualize MMRS as well?

Thanks

Eric

> return 0;
> }
>