RE: [PATCH RFC 00/15] Add VFIO mediated device support and IMS support for the idxd driver.

From: Tian, Kevin
Date: Mon Apr 27 2020 - 08:13:55 EST


> From: Jason Gunthorpe <jgg@xxxxxxxxxxxx>
> Sent: Monday, April 27, 2020 3:14 AM
[...]
> > technically Scalable IOV is definitely different from SR-IOV. It's
> > simpler in hardware. And we're not emulating SR-IOV. The point
> > is just in usage-wise we want to present a consistent user
> > experience just like passing through a PCI endpoint (PF or VF) device
> > through vfio eco-system, including various userspace VMMs (Qemu,
> > firecracker, rust-vmm, etc.), middleware (Libvirt), and higher level
> > management stacks.
>
> Yes, I understand your desire, but at the same time we have not been
> doing device emulation in the kernel. You should at least be
> forthwright about that major change in the cover letters/etc.

I searched 'emulate' in kernel/Documentation:

Documentation/sound/alsa-configuration.rst (emulate oss on alsa)
Documentation/security/tpm/tpm_vtpm_proxy.rst (emulate virtual TPM)
Documentation/networking/generic-hdlc.txt (emulate eth on HDLC)
Documentation/gpu/todo.rst (generic fbdev emulation)
...

I believe the main reason why putting such emulations in kernel is
because those emulated device interfaces have their established
eco-systems and values which the kernel shouldn't break. As you
emphasize earlier, they have good reasons for getting into kernel.

Then back to this context. Almost every newly-born Linux VMM
(firecracker, crosvm, cloud hypervisor, and some proprietary
implementations) support only two types of devices: virtio and
vfio, because they want to be simple and slim. Virtio provides a
basic set of I/O capabilities required by most VMs, while vfio brings
an unified interface for gaining added values or higher performance
from assigned devices. Even Qemu supports a minimal configuration
('microvm') now, for similar reason. So the vfio eco-system is
significant and represents a major trend in the virtualization space.

Then supporting vfio eco-system is actually the usage GOAL
of this patch series, instead of an optional technique to be opted.
vfio-pci is there for passing through standalone PCI endpoints
(PF or VF), and vfio-mdev is there for passing through smaller
portion of device resources but sharing the same VFIO interface
to gain the uniform support in this eco-system.

I believe above is the good reason for putting emulation in idxd
driver by using vfio-mdev. Yes, it does imply that there will be
more emulations in kernel when more Scalable-IOV (or alike)
devices are introduced. But as explained earlier, the pci config
space emulation can be largely consolidated and reused. and
the remaining device specific MMIO emulation is relatively
simple because we define virtual device interface to be same
as or even simpler than a VF interface. Only a small set of registers
are emulated after fast-path resource is passed through, and
such small set of course needs to meet the normal quality
requirement for getting into the kernel.

We'll definitely highlight this part in future cover letter. ð

>
> > > The only thing we get out of this is someone doesn't have to write a
> > > idxd emulation driver in qemu, instead they have to write it in the
> > > kernel. I don't see how that is a win for the ecosystem.
> >
> > No. The clear win is on leveraging classic VFIO iommu and its eco-system
> > as explained above.
>
> vdpa had no problem implementing iommu support without VFIO. This was
> their original argument too, it turned out to be erroneous.
>

Every wheel can be re-invented... my gut-feeling is that vdpa is for
offloading fast-path vhost operations to the underlying accelerators.
It is just a welcomed/reasonable extension to the existing virtio/vhost
eco-system. For other types of devices such as idxd, we rely on the vfio
eco-system to catch up fast-evolving VMM spectrum.

Thanks
Kevin