Re: [PATCH RFC 00/15] Add VFIO mediated device support and IMS support for the idxd driver.

From: Alex Williamson
Date: Sun Apr 26 2020 - 23:44:14 EST


On Sun, 26 Apr 2020 16:13:57 -0300
Jason Gunthorpe <jgg@xxxxxxxxxxxx> wrote:

> On Sun, Apr 26, 2020 at 05:18:59AM +0000, Tian, Kevin wrote:
>
> > > > I think providing an unified abstraction to userspace is also important,
> > > > which is what VFIO provides today. The merit of using one set of VFIO
> > > > API to manage all kinds of mediated devices and VF devices is a major
> > > > gain. Instead, inventing a new vDPA-like interface for every Scalable-IOV
> > > > or equivalent device is just overkill and doesn't scale. Also the actual
> > > > emulation code in idxd driver is actually small, if putting aside the PCI
> > > > config space part for which I already explained most logic could be shared
> > > > between mdev device drivers.
> > >
> > > If it was just config space you might have an argument, VFIO already
> > > does some config space mangling, but emulating BAR space is out of
> > > scope of VFIO, IMHO.
> >
> > out of scope of vfio-pci, but in scope of vfio-mdev. btw I feel that most
> > of your objections are actually related to the general idea of
> > vfio-mdev.
>
> There have been several abusive proposals of vfio-mdev, everything
> from a way to create device drivers to this kind of generic emulation
> framework.
>
> > Scalable IOV just uses PASID to harden DMA isolation in mediated
> > pass-through usage which vfio-mdev enables. Then are you just opposing
> > the whole vfio-mdev? If not, I'm curious about the criteria in your mind
> > about when using vfio-mdev is good...
>
> It is appropriate when non-PCI standard techniques are needed to do
> raw device assignment, just like VFIO.
>
> Basically if vfio-pci is already doing it then it seems reasonable
> that vfio-mdev should do the same. This mission creep where vfio-mdev
> gains functionality far beyond VFIO is the problem.

Ehm, vfio-pci emulates BARs too. We also emulate FLR, power
management, DisINTx, and VPD. FLR, PM, and VPD all have device
specific quirks in the host kernel, and I've generally taken the stance
that would should take advantage of those quirks, not duplicate them in
userspace and not invent new access mechanisms/ioctls for each of them.
Emulating DisINTx is convenient since we must have a mechanism to mask
INTx, whether it's at the device or the APIC, so we can pretend the
hardware supports it. BAR emulation is really too trivial to argue
about, the BARs mean nothing to the physical device mapping, they're
simply scratch registers that we mask out the alignment bits on read.
vfio-pci is a mix of things that we decide are too complicated or
irrelevant to emulate in the kernel and things that take advantage of
shared quirks or are just too darn easy to worry about. BARs fall into
that latter category, any sort of mapping into VM address spaces is
necessarily done in userspace, but scratch registers that are masked on
read, *shrug*, vfio-pci does that. Thanks,

Alex

> > technically Scalable IOV is definitely different from SR-IOV. It's
> > simpler in hardware. And we're not emulating SR-IOV. The point
> > is just in usage-wise we want to present a consistent user
> > experience just like passing through a PCI endpoint (PF or VF) device
> > through vfio eco-system, including various userspace VMMs (Qemu,
> > firecracker, rust-vmm, etc.), middleware (Libvirt), and higher level
> > management stacks.
>
> Yes, I understand your desire, but at the same time we have not been
> doing device emulation in the kernel. You should at least be
> forthwright about that major change in the cover letters/etc.
>
> > > The only thing we get out of this is someone doesn't have to write a
> > > idxd emulation driver in qemu, instead they have to write it in the
> > > kernel. I don't see how that is a win for the ecosystem.
> >
> > No. The clear win is on leveraging classic VFIO iommu and its eco-system
> > as explained above.
>
> vdpa had no problem implementing iommu support without VFIO. This was
> their original argument too, it turned out to be erroneous.
>
> Jason
>