RE: Virtualizing MSI-X on IMS via VFIO

From: Thomas Gleixner
Date: Wed Jun 23 2021 - 22:20:41 EST


Kevin,

thank you very much for digging into this! You made my day!

On Thu, Jun 24 2021 at 00:00, Kevin Tian wrote:
>> From: Alex Williamson <alex.williamson@xxxxxxxxxx>
>> To work with what we've got, the vfio API describes the limitation of
>> the host interfaces via the VFIO_IRQ_INFO_NORESIZE flag. QEMU then
>> makes a choice in an attempt to better reflect what we can infer of the
>> guest programming of the device to incrementally enable vectors. We
>
> It's a surprise to me that Qemu even doesn't look at this flag today after
> searching its code...

Indeed.

git clone https://github.com/qemu/qemu.git
cd qemu
git log -p | grep NORESIZE
+ * The NORESIZE flag indicates that the interrupt lines within the index
+#define VFIO_IRQ_INFO_NORESIZE (1 << 3)

According to the git history of QEMU this was never used at all and I
don't care about the magic muck which might be in some RHT repository
which might make use of that.

Find below the proper fix for this nonsense which just wasted everyones
time. I'll post it officialy with a proper changelog tomorrow unless
Kevin beats me to it who actually unearthed this and surely earns the
credit.

Alex, I seriously have to ask what you were trying to tell us about this
flag and it's great value and the design related to this.

I'm sure you can submit the corresponding fix to qemu yourself.

And once you are back from lala land, can you please explain how
VFIO/PCI/MSIX is supposed to work in reality?

Thanks,

tglx
---
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -1644,8 +1644,6 @@ static long intel_vgpu_ioctl(struct mdev
if (info.index == VFIO_PCI_INTX_IRQ_INDEX)
info.flags |= (VFIO_IRQ_INFO_MASKABLE |
VFIO_IRQ_INFO_AUTOMASKED);
- else
- info.flags |= VFIO_IRQ_INFO_NORESIZE;

return copy_to_user((void __user *)arg, &info, minsz) ?
-EFAULT : 0;
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -1018,8 +1018,6 @@ static long vfio_pci_ioctl(struct vfio_d
if (info.index == VFIO_PCI_INTX_IRQ_INDEX)
info.flags |= (VFIO_IRQ_INFO_MASKABLE |
VFIO_IRQ_INFO_AUTOMASKED);
- else
- info.flags |= VFIO_IRQ_INFO_NORESIZE;

return copy_to_user((void __user *)arg, &info, minsz) ?
-EFAULT : 0;
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -693,16 +693,6 @@ struct vfio_region_info_cap_nvlink2_lnks
* automatically masked by VFIO and the user needs to unmask the line
* to receive new interrupts. This is primarily intended to distinguish
* level triggered interrupts.
- *
- * The NORESIZE flag indicates that the interrupt lines within the index
- * are setup as a set and new subindexes cannot be enabled without first
- * disabling the entire index. This is used for interrupts like PCI MSI
- * and MSI-X where the driver may only use a subset of the available
- * indexes, but VFIO needs to enable a specific number of vectors
- * upfront. In the case of MSI-X, where the user can enable MSI-X and
- * then add and unmask vectors, it's up to userspace to make the decision
- * whether to allocate the maximum supported number of vectors or tear
- * down setup and incrementally increase the vectors as each is enabled.
*/
struct vfio_irq_info {
__u32 argsz;
@@ -710,7 +700,6 @@ struct vfio_irq_info {
#define VFIO_IRQ_INFO_EVENTFD (1 << 0)
#define VFIO_IRQ_INFO_MASKABLE (1 << 1)
#define VFIO_IRQ_INFO_AUTOMASKED (1 << 2)
-#define VFIO_IRQ_INFO_NORESIZE (1 << 3)
__u32 index; /* IRQ index */
__u32 count; /* Number of IRQs within this index */
};
--- a/samples/vfio-mdev/mtty.c
+++ b/samples/vfio-mdev/mtty.c
@@ -1092,9 +1092,6 @@ static int mtty_get_irq_info(struct mdev
if (irq_info->index == VFIO_PCI_INTX_IRQ_INDEX)
irq_info->flags |= (VFIO_IRQ_INFO_MASKABLE |
VFIO_IRQ_INFO_AUTOMASKED);
- else
- irq_info->flags |= VFIO_IRQ_INFO_NORESIZE;
-
return 0;
}