On Tue, 12 Nov 2024 21:34:30 -0400
Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:
On Tue, Nov 12, 2024 at 01:54:58PM -0800, Nicolin Chen wrote:
On Mon, Nov 11, 2024 at 01:09:20PM +0000, Robin Murphy wrote:
On 2024-11-09 5:48 am, Nicolin Chen wrote:
To solve this problem the VMM should capture the MSI IOVA allocated by the
guest kernel and relay it to the GIC driver in the host kernel, to program
the correct MSI IOVA. And this requires a new ioctl via VFIO.
Once VFIO has that information from userspace, though, do we really need
the whole complicated dance to push it right down into the irqchip layer
just so it can be passed back up again? AFAICS
vfio_msi_set_vector_signal() via VFIO_DEVICE_SET_IRQS already explicitly
rewrites MSI-X vectors, so it seems like it should be pretty
straightforward to override the message address in general at that
level, without the lower layers having to be aware at all, no?
Didn't see that clearly!! It works with a simple following override:
--------------------------------------------------------------------
@@ -497,6 +497,10 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
struct msi_msg msg;
get_cached_msi_msg(irq, &msg);
+ if (vdev->msi_iovas) {
+ msg.address_lo = lower_32_bits(vdev->msi_iovas[vector]);
+ msg.address_hi = upper_32_bits(vdev->msi_iovas[vector]);
+ }
pci_write_msi_msg(irq, &msg);
}
--------------------------------------------------------------------
With that, I think we only need one VFIO change for this part :)
Wow, is that really OK from a layering perspective? The comment is
pretty clear on the intention that this is to resync the irq layer
view of the device with the physical HW.
Editing the msi_msg while doing that resync smells bad.
Also, this is only doing MSI-X, we should include normal MSI as
well. (it probably should have a resync too?)
This was added for a specific IBM HBA that clears the vector table
during a built-in self test, so it's possible the MSI table being in
config space never had the same issue, or we just haven't encountered
it. I don't expect anything else actually requires this.
I'd want Thomas/Marc/Alex to agree.. (please read the cover letter for
context)
It seems suspect to me too. In a sense it is still just synchronizing
the MSI address, but to a different address space.
Is it possible to do this with the existing write_msi_msg callback on
the msi descriptor? For instance we could simply translate the msg
address and call pci_write_msi_msg() (while avoiding an infinite
recursion). Or maybe there should be an xlate_msi_msg callback we can
register. Or I suppose there might be a way to insert an irqchip that
does the translation on write. Thanks,