Re: [PATCH RFC 00/15] Add VFIO mediated device support and IMS support for the idxd driver.

From: Jason Gunthorpe
Date: Thu Apr 23 2020 - 15:18:59 EST


On Wed, Apr 22, 2020 at 02:24:11PM -0700, Dan Williams wrote:
> On Tue, Apr 21, 2020 at 4:55 PM Jason Gunthorpe <jgg@xxxxxxxxxxxx> wrote:
> >
> > On Tue, Apr 21, 2020 at 04:33:46PM -0700, Dave Jiang wrote:
> > > The actual code is independent of the stage 2 driver code submission that adds
> > > support for SVM, ENQCMD(S), PASID, and shared workqueues. This code series will
> > > support dedicated workqueue on a guest with no vIOMMU.
> > >
> > > A new device type "mdev" is introduced for the idxd driver. This allows the wq
> > > to be dedicated to the usage of a VFIO mediated device (mdev). Once the work
> > > queue (wq) is enabled, an uuid generated by the user can be added to the wq
> > > through the uuid sysfs attribute for the wq. After the association, a mdev can
> > > be created using this UUID. The mdev driver code will associate the uuid and
> > > setup the mdev on the driver side. When the create operation is successful, the
> > > uuid can be passed to qemu. When the guest boots up, it should discover a DSA
> > > device when doing PCI discovery.
> >
> > I'm feeling really skeptical that adding all this PCI config space and
> > MMIO BAR emulation to the kernel just to cram this into a VFIO
> > interface is a good idea, that kind of stuff is much safer in
> > userspace.
> >
> > Particularly since vfio is not really needed once a driver is using
> > the PASID stuff. We already have general code for drivers to use to
> > attach a PASID to a mm_struct - and using vfio while disabling all the
> > DMA/iommu config really seems like an abuse.
> >
> > A /dev/idxd char dev that mmaps a bar page and links it to a PASID
> > seems a lot simpler and saner kernel wise.
> >
> > > The mdev utilizes Interrupt Message Store or IMS[3] instead of MSIX for
> > > interrupts for the guest. This preserves MSIX for host usages and also allows a
> > > significantly larger number of interrupt vectors for guest usage.
> >
> > I never did get a reply to my earlier remarks on the IMS patches.
> >
> > The concept of a device specific addr/data table format for MSI is not
> > Intel specific. This should be general code. We have a device that can
> > use this kind of kernel capability today.
>
> This has been my concern reviewing the implementation. IMS needs more
> than one in-tree user to validate degrees of freedom in the api. I had
> been missing a second "in-tree user" to validate the scope of the
> flexibility that was needed.

IMS is too narrowly specified.

All platforms that support MSI today can support IMS. It is simply a
way for the platform to give the driver an addr/data pair that triggers
an interrupt when a posted write is performed to that pair.

This is different from the other interrupt setup flows which are
tightly tied to the PCI layer. Here the driver should simply ask for
interrupts.

Ie the entire IMS API to the driver should be something very simple
like:

struct message_irq
{
uint64_t addr;
uint32_t data;
};

struct message_irq *request_message_irq(
struct device *, irq_handler_t handler, unsigned long flags,
const char *name, void *dev);

And the plumbing underneath should setup the irq chips and so forth as
required.

Jason