Re: [patch 21/32] NTB/msi: Convert to msi_on_each_desc()

From: Mika Penttilä
Date: Sun Dec 12 2021 - 01:45:21 EST




On 10.12.2021 9.36, Tian, Kevin wrote:
From: Jason Gunthorpe <jgg@xxxxxxxxxx>
Sent: Friday, December 10, 2021 4:59 AM

On Thu, Dec 09, 2021 at 09:32:42PM +0100, Thomas Gleixner wrote:
On Thu, Dec 09 2021 at 12:21, Jason Gunthorpe wrote:
On Thu, Dec 09, 2021 at 09:37:06AM +0100, Thomas Gleixner wrote:
If we keep the MSI emulation in the hypervisor then MSI != IMS. The
MSI code needs to write a addr/data pair compatible with the emulation
and the IMS code needs to write an addr/data pair from the
hypercall. Seems like this scenario is best avoided!

From this perspective I haven't connected how virtual interrupt
remapping helps in the guest? Is this a way to provide the hypercall
I'm imagining above?
That was my thought to avoid having different mechanisms.

The address/data pair is computed in two places:

1) Activation of an interrupt
2) Affinity setting on an interrupt

Both configure the IRTE when interrupt remapping is in place.

In both cases a vector is allocated in the vector domain and based on
the resulting target APIC / vector number pair the IRTE is
(re)configured.

So putting the hypercall into the vIRTE update is the obvious
place. Both activation and affinity setting can fail and propagate an
error code down to the originating caller.

Hmm?
Okay, I think I get it. Would be nice to have someone from intel
familiar with the vIOMMU protocols and qemu code remark what the
hypervisor side can look like.

There is a bit more work here, we'd have to change VFIO to somehow
entirely disconnect the kernel IRQ logic from the MSI table and
directly pass control of it to the guest after the hypervisor IOMMU IR
secures it. ie directly mmap the msi-x table into the guest

It's supported already:

/*
* The MSIX mappable capability informs that MSIX data of a BAR can be mmapped
* which allows direct access to non-MSIX registers which happened to be within
* the same system page.
*
* Even though the userspace gets direct access to the MSIX data, the existing
* VFIO_DEVICE_SET_IRQS interface must still be used for MSIX configuration.
*/
#define VFIO_REGION_INFO_CAP_MSIX_MAPPABLE 3

IIRC this was introduced for PPC when a device has MSI-X in the same BAR as
other MMIO registers. Trapping MSI-X leads to performance downgrade on
accesses to adjacent registers. MSI-X can be mapped by userspace because
PPC already uses a hypercall mechanism for interrupt. Though unclear about
the detail it sounds a similar usage as proposed here.

Thanks
Kevin

I see  VFIO_REGION_INFO_CAP_MSIX_MAPPABLE is always set so if msix table is in its own bar, qemu never traps/emulates the access. On the other hand, qemu is said to depend on emulating masking. So how is this supposed to work, in case the table is not in the config bar?

Thanks,
Mika