Re: [RFC 08/20] vfio/pci: Add VFIO_DEVICE_BIND_IOMMUFD

From: Jason Gunthorpe
Date: Tue Sep 21 2021 - 13:29:49 EST


On Sun, Sep 19, 2021 at 02:38:36PM +0800, Liu Yi L wrote:
> This patch adds VFIO_DEVICE_BIND_IOMMUFD for userspace to bind the vfio
> device to an iommufd. No VFIO_DEVICE_UNBIND_IOMMUFD interface is provided
> because it's implicitly done when the device fd is closed.
>
> In concept a vfio device can be bound to multiple iommufds, each hosting
> a subset of I/O address spaces attached by this device. However as a
> starting point (matching current vfio), only one I/O address space is
> supported per vfio device. It implies one device can only be attached
> to one iommufd at this point.
>
> Signed-off-by: Liu Yi L <yi.l.liu@xxxxxxxxx>
> drivers/vfio/pci/Kconfig | 1 +
> drivers/vfio/pci/vfio_pci.c | 72 ++++++++++++++++++++++++++++-
> drivers/vfio/pci/vfio_pci_private.h | 8 ++++
> include/uapi/linux/vfio.h | 30 ++++++++++++
> 4 files changed, 110 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
> index 5e2e1b9a9fd3..3abfb098b4dc 100644
> +++ b/drivers/vfio/pci/Kconfig
> @@ -5,6 +5,7 @@ config VFIO_PCI
> depends on MMU
> select VFIO_VIRQFD
> select IRQ_BYPASS_MANAGER
> + select IOMMUFD
> help
> Support for the PCI VFIO bus driver. This is required to make
> use of PCI drivers using the VFIO framework.
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index 145addde983b..20006bb66430 100644
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -552,6 +552,16 @@ static void vfio_pci_release(struct vfio_device *core_vdev)
> vdev->req_trigger = NULL;
> }
> mutex_unlock(&vdev->igate);
> +
> + mutex_lock(&vdev->videv_lock);
> + if (vdev->videv) {
> + struct vfio_iommufd_device *videv = vdev->videv;
> +
> + vdev->videv = NULL;
> + iommufd_unbind_device(videv->idev);
> + kfree(videv);
> + }
> + mutex_unlock(&vdev->videv_lock);
> }
>
> mutex_unlock(&vdev->reflck->lock);
> @@ -780,7 +790,66 @@ static long vfio_pci_ioctl(struct vfio_device *core_vdev,
> container_of(core_vdev, struct vfio_pci_device, vdev);
> unsigned long minsz;
>
> - if (cmd == VFIO_DEVICE_GET_INFO) {
> + if (cmd == VFIO_DEVICE_BIND_IOMMUFD) {

Choosing to implement this through the ioctl multiplexor is what is
causing so much ugly gyration in the previous patches

This should be a straightforward new function and ops:

struct iommufd_device *vfio_pci_bind_iommufd(struct vfio_device *)
{
iommu_dev = iommufd_bind_device(bind_data.iommu_fd,
&vdev->pdev->dev,
bind_data.dev_cookie);
if (!iommu_dev) return ERR
vdev->iommu_dev = iommu_dev;
}
static const struct vfio_device_ops vfio_pci_ops = {
.bind_iommufd = &*vfio_pci_bind_iommufd

If you do the other stuff I said then you'll notice that the
iommufd_bind_device() will provide automatic exclusivity.

The thread that sees ops->bind_device succeed will know it is the only
thread that can see that (by definition, the iommu enable user stuff
has to be exclusive and race free) thus it can go ahead and store the
iommu pointer.

The other half of the problem '&vdev->block_access' is solved by
manipulating the filp->f_ops. Start with a fops that can ONLY call the
above op. When the above op succeeds switch the fops to the normal
full ops. .

The same flow happens when the group fd spawns the device fd, just
parts of iommfd_bind_device are open coded into the vfio code, but the
whole flow and sequence should be the same.

> + /*
> + * Reject the request if the device is already opened and
> + * attached to a container.
> + */
> + if (vfio_device_in_container(core_vdev))
> + return -ENOTTY;

This is wrongly locked

> +
> + minsz = offsetofend(struct vfio_device_iommu_bind_data, dev_cookie);
> +
> + if (copy_from_user(&bind_data, (void __user *)arg, minsz))
> + return -EFAULT;
> +
> + if (bind_data.argsz < minsz ||
> + bind_data.flags || bind_data.iommu_fd < 0)
> + return -EINVAL;
> +
> + mutex_lock(&vdev->videv_lock);
> + /*
> + * Allow only one iommufd per device until multiple
> + * address spaces (e.g. vSVA) support is introduced
> + * in the future.
> + */
> + if (vdev->videv) {
> + mutex_unlock(&vdev->videv_lock);
> + return -EBUSY;
> + }
> +
> + idev = iommufd_bind_device(bind_data.iommu_fd,
> + &vdev->pdev->dev,
> + bind_data.dev_cookie);
> + if (IS_ERR(idev)) {
> + mutex_unlock(&vdev->videv_lock);
> + return PTR_ERR(idev);
> + }
> +
> + videv = kzalloc(sizeof(*videv), GFP_KERNEL);
> + if (!videv) {
> + iommufd_unbind_device(idev);
> + mutex_unlock(&vdev->videv_lock);
> + return -ENOMEM;
> + }
> + videv->idev = idev;
> + videv->iommu_fd = bind_data.iommu_fd;

No need for more memory, a struct vfio_device can be attached to a
single iommu context. If idev then the context and all the other
information is valid.

> + if (atomic_read(&vdev->block_access))
> + atomic_set(&vdev->block_access, 0);

I'm sure I'll tell you this is all wrongly locked too if I look
closely.

> +/*
> + * VFIO_DEVICE_BIND_IOMMUFD - _IOR(VFIO_TYPE, VFIO_BASE + 19,
> + * struct vfio_device_iommu_bind_data)
> + *
> + * Bind a vfio_device to the specified iommufd
> + *
> + * The user should provide a device cookie when calling this ioctl. The
> + * cookie is later used in iommufd for capability query, iotlb invalidation
> + * and I/O fault handling.
> + *
> + * User is not allowed to access the device before the binding operation
> + * is completed.
> + *
> + * Unbind is automatically conducted when device fd is closed.
> + *
> + * Input parameters:
> + * - iommu_fd;
> + * - dev_cookie;
> + *
> + * Return: 0 on success, -errno on failure.
> + */
> +struct vfio_device_iommu_bind_data {
> + __u32 argsz;
> + __u32 flags;
> + __s32 iommu_fd;
> + __u64 dev_cookie;

Missing explicit padding

Always use __aligned_u64 in uapi headers, fix all the patches.

Jason