Re: [PATCH RFC 00/15] Add VFIO mediated device support and IMS support for the idxd driver.

From: Jason Gunthorpe
Date: Fri Apr 24 2020 - 14:12:16 EST


On Fri, Apr 24, 2020 at 04:25:56PM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe
> > Sent: Friday, April 24, 2020 8:45 PM
> >
> > On Fri, Apr 24, 2020 at 03:27:41AM +0000, Tian, Kevin wrote:
> >
> > > > > That by itself doesn't translate to what a guest typically does
> > > > > with a VDEV. There are other control paths that need to be serviced
> > > > > from the kernel code via VFIO. For speed path operations like
> > > > > ringing doorbells and such they are directly managed from guest.
> > > >
> > > > You don't need vfio to mmap BAR pages to userspace. The unique thing
> > > > that vfio gives is it provides a way to program the classic non-PASID
> > > > iommu, which you are not using here.
> > >
> > > That unique thing is indeed used here. Please note sharing CPU virtual
> > > address space with device (what SVA API is invented for) is not the
> > > purpose of this series. We still rely on classic non-PASID iommu
> > programming,
> > > i.e. mapping/unmapping IOVA->HPA per iommu_domain. Although
> > > we do use PASID to tag ADI, the PASID is contained within iommu_domain
> > > and invisible to VFIO. From userspace p.o.v, this is a device passthrough
> > > usage instead of PASID-based address space binding.
> >
> > So you have PASID support but don't use it? Why? PASID is much better
> > than classic VFIO iommu, it doesn't require page pinning...
>
> PASID and I/O page fault (through ATS/PRI) are orthogonal things. Don't
> draw the equation between them. The host driver can tag PASID to
> ADI so every DMA request out of that ADI has a PASID prefix, allowing VT-d
> to do PASID-granular DMA isolation. However I/O page fault cannot be
> taken for granted. A scalable IOV device may support PASID while without
> ATS/PRI. Even when ATS/PRI is supported, the tolerance of I/O page fault
> is decided by the work queue mode that is configured by the guest. For
> example, if the guest put the work queue in non-faultable transaction
> mode, the device doesn't do PRI and simply report error if no valid IOMMU
> mapping.

Okay, that makes sense, I wasn't aware people were doing PASID without
ATS at this point..

> > > idxd is just the first device that supports Scalable IOV. We have a
> > > lot more coming later, in different types. Then putting such
> > > emulation in user space means that Qemu needs to support all those
> > > vendor specific interfaces for every new device which supports
> >
> > It would be very sad to see an endless amount of device emulation code
> > crammed into the kernel. Userspace is where device emulation is
> > supposed to live. For security
>
> I think providing an unified abstraction to userspace is also important,
> which is what VFIO provides today. The merit of using one set of VFIO
> API to manage all kinds of mediated devices and VF devices is a major
> gain. Instead, inventing a new vDPA-like interface for every Scalable-IOV
> or equivalent device is just overkill and doesn't scale. Also the actual
> emulation code in idxd driver is actually small, if putting aside the PCI
> config space part for which I already explained most logic could be shared
> between mdev device drivers.

If it was just config space you might have an argument, VFIO already
does some config space mangling, but emulating BAR space is out of
scope of VFIO, IMHO.

I also think it is disingenuous to pretend this is similar to
SR-IOV. SR-IOV is self contained and the BAR does not require
emulation. What you have here sounds like it is just an ordinary
multi-queue device with the ability to PASID tag queues for IOMMU
handling. This is absolutely not SRIOV - it is much closer to VDPA,
which isn't using mdev.

Further, I disagree with your assessment that this doesn't scale. You
already said you plan a normal user interface for idxd, so instead of
having a single sane user interface (ala VDPA) idxd now needs *two*. If
this is the general pattern of things to come, it is a bad path.

The only thing we get out of this is someone doesn't have to write a
idxd emulation driver in qemu, instead they have to write it in the
kernel. I don't see how that is a win for the ecosystem.

Jason