Re: [PATCH v6 15/20] vfio/mdev: idxd: ims domain setup for the vdcm

From: Jason Gunthorpe
Date: Thu May 27 2021 - 09:36:48 EST


On Wed, May 26, 2021 at 06:41:07PM -0700, Raj, Ashok wrote:
> On Wed, May 26, 2021 at 09:54:44PM -0300, Jason Gunthorpe wrote:
> > On Wed, May 26, 2021 at 05:22:22PM -0700, Dave Jiang wrote:
> > >
> > > On 5/23/2021 4:50 PM, Jason Gunthorpe wrote:
> > > > On Fri, May 21, 2021 at 05:20:37PM -0700, Dave Jiang wrote:
> > > > > @@ -77,8 +80,18 @@ int idxd_mdev_host_init(struct idxd_device *idxd, struct mdev_driver *drv)
> > > > > return rc;
> > > > > }
> > > > > + ims_info.max_slots = idxd->ims_size;
> > > > > + ims_info.slots = idxd->reg_base + idxd->ims_offset;
> > > > > + idxd->ims_domain = pci_ims_array_create_msi_irq_domain(idxd->pdev, &ims_info);
> > > > > + if (!idxd->ims_domain) {
> > > > > + dev_warn(dev, "Fail to acquire IMS domain\n");
> > > > > + iommu_dev_disable_feature(dev, IOMMU_DEV_FEAT_AUX);
> > > > > + return -ENODEV;
> > > > > + }
> > > > I'm quite surprised that every mdev doesn't create its own ims_domain
> > > > in its probe function.
> > > >
> > > > This places a global total limit on the # of vectors which makes me
> > > > ask what was the point of using IMS in the first place ?
> > > >
> > > > The entire idea for IMS was to make the whole allocation system fully
> > > > dynamic based on demand.
> > >
> > > Hi Jason, thank you for the review of the series.
> > >
> > > My understanding is that the driver creates a single IMS domain for the
> > > device and provides the address base and IMS numbers for the domain based on
> > > device IMS resources. So the IMS region needs to be contiguous. Each mdev
> > > can call msi_domain_alloc_irqs() and acquire the number of IMS vectors it
> > > desires and the DEV MSI core code will keep track of which vectors are being
> > > used. This allows the mdev devices to dynamically allocate based on demand.
> > > If the driver allocates a domain per mdev, it'll needs to do internal
> > > accounting of the base and vector numbers for each of those domains that the
> > > MSI core already provides. Isn't that what we are trying to avoid? As mdevs
> > > come and go, that partitioning will become fragmented.
> >
> > I suppose it depends entirely on how the HW works.
> >
> > If the HW has a fixed number of interrupt vectors organized in a
> > single table then by all means allocate a single domain that spans the
> > entire fixed HW vector space. But then why do we have a ims_size
> > variable here??
> >
> > However, that really begs the question of why the HW is using IMS at
> > all? I'd expect needing 2x-10x the max MSI-X vector size before
> > reaching for IMS.
>
> Its more than the number of vectors. Yes, thats one of the attributes.
> IMS also has have additional flexibility. I think we covered this a while
> back but maybe lost since its been a while.
>
> - Format isn't just the standard MSIx, for e.g. DSA has the pending bits
> all merged and co-located together with the interrupt store.

But this is just random hardware churn, there is nothing wrong with
keeping the pending bits in the standard format

> - You might want the vector space to be mabe on device. (I think you
> alluded one of your devices can actually do that?)

Sure, but IDXD is not doing this

> - Remember we do handle validation when interrupts are requested from user
> space. Interrupts are validated with PASID of requester. (I think we also
> talked about if we should turn the interrupt message to also take a PASID
> as opposed to request without PASID as its specified in PCIe)

Yes, but overall this doesn't really make sense to me, and doesn't in
of itself require IMS. The PASID table could be an addendum to the
normal MSI-X table.

Besides, Interrupts and PASID are not related concepts. Does every
user SVA process with a unique PASID get a unique interrupt? The
device and CPU doesn't have enough vectors to do this.

Frankly I expect interrupts to be multiplexed by queues not by PASID,
so that interrupts can be shared.

> - For certain devices the interupt might be simply in the user context
> maintained by the kernel. Graphics for e.g.

IDXD is also not doing this.

Jason