ARM PCI/MSI KVM passthrough with GICv2M

From: Eric Auger
Date: Fri Feb 05 2016 - 12:33:06 EST


Hi Alex,

I tried to sketch a proposal for guaranteeing the IRQ integrity when
doing ARM PCI/MSI passthrough with ARM GICv2M msi-controller. This is
based on extended VFIO group viability control, as detailed below.

As opposed to ARM GICv3 ITS, this MSI controller does *not* support IRQ
remapping. It can expose 1 or more 4kB MSI frame. Each frame contains a
single register where the msi data is written.

I would be grateful to you if you could tell me whether it makes any sense.

Thanks in advance

Best Regards

Eric


1) GICv2m with a single 4kB single frame
all devices having this msi-controller as msi-parent share this
single MSI frame. Those devices can work on behalf of the host
or work on behalf of 1 or more guests (KVM assigned devices). We
must make sure either the host only or 1 single VM can access to the
single frame to guarantee interrupt integrity: a device assigned
to 1 VM should not be able to trigger MSI targeted to the host
or another VM.

I would propose to extend the VFIO notion of group viability.
Currently a VFIO group is viable if:
all devices belonging to the same group are bound to a VFIO driver
or unbound.

Let's imagine we extend the viability check as follows:

0) keep the current viable check: all the devices belonging to
the group must be vfio bound or unbound.
1) retrieve the MSI parent of the device and list all the
other devices using that MSI controller as MSI-parent (does not
look straightforward):
2) they must be VFIO driver bound or unbound as well (meaning
they are not used by the host). If not, reject device attachment
- in case they are VFIO bound (a VFIO group is set):
x if all VFIO containers are the same as the one of the device's
we try to attach, that's OK. This means the other devices
use different IOMMU mappings, eventually will target the
MSI frame but they all work for the same user space client/VM.
x 1 or more devices has a different container than the device
under attachment:
It works on behalf of a different user space client/VM,
we can't attach the new device. I think there is a case however
where severals containers can be opened by a single QEMU.

Of course the dynamic aspects, ie a new device showing up or an unbind
event bring significant complexity.

2) GICv2M with multiple 4kB frames
Each msi-frame is enumerated as msi-controller. The device tree
statically defines which device is attached to each msi frame.
In case devices are assigned we cannot change this attachment
anyway since there might be physical contraints behind.
So devices likely to be assigned to guests should be linked to a
different MSI frame than devices that are not.

I think extended viability concept can be used as well.

This model still is not ideal: in case we have a SR-IOV device
plugged onto an host bridge attached to a single MSI parent you won't
be able anyway to have 1 Virtual Function working for host and 1 VF
working for a guest. Only Interrupt translation (ITS) will bring that
feature.

3) GICv3 ITS
This one supports interrupt translation service ~ Intel
IRQ remapping.
This means a single frame can be used by all devices. A deviceID is
used exclusively by the host or a guest. I assume the ITS driver
allocates/populates deviceid interrupt translation table featuring
separate LPI spaces ie by construction different ITT cannot feature
same LPIs. So no need to do the extended viability test.

The MSI controller should have a property telling whether
it supports interrupt translation. This kind of property currently
exists on IOMMU side for INTEL remapping.