Re: [PATCH 5/5] x86/kvm: Add KVM_FEATURE_MSI_EXT_DEST_ID

From: Thomas Gleixner
Date: Thu Oct 08 2020 - 17:14:32 EST


On Thu, Oct 08 2020 at 17:08, David Woodhouse wrote:
> On Thu, 2020-10-08 at 13:55 +0100, David Woodhouse wrote:
>
> (We'd want the x86_vector_domain to actually have an MSI compose
> function in the !CONFIG_PCI_MSI case if we did this, of course.)

The compose function and the vector domain wrapper can simply move to vector.c

> From 2fbc79588d4677ee1cc9df661162fcf1a7da57f0 Mon Sep 17 00:00:00 2001
> From: David Woodhouse <dwmw@xxxxxxxxxxxx>
> Date: Thu, 8 Oct 2020 15:44:42 +0100
> Subject: [PATCH 6/5] x86/ioapic: Generate RTE directly from parent irqchip's MSI
> message
>
> The IOAPIC generates an MSI cycle with address/data bits taken from its
> Redirection Table Entry in some combination which used to make sense,
> but now is just a bunch of bits which get passed through in some
> seemingly arbitrary order.
>
> Instead of making IRQ remapping drivers directly frob the IOAPIC RTE,
> let them just do their job and generate an MSI message. The bit
> swizzling to turn that MSI message into the IOAPIC's RTE is the same in
> all cases, since it's a function of the IOAPIC hardware. The IRQ
> remappers have no real need to get involved with that.
>
> The only slight caveat is that the IOAPIC is interpreting some of
> those fields too, and it does want the 'vector' field to be unique
> to make EOI work. The AMD IOMMU happens to put its IRTE index in the
> bits that the IOAPIC thinks are the vector field, and accommodates
> this requirement by reserving the first 32 indices for the IOAPIC.
> The Intel IOMMU doesn't actually use the bits that the IOAPIC thinks
> are the vector field, so it fills in the 'pin' value there instead.
>
> Signed-off-by: David Woodhouse <dwmw@xxxxxxxxxxxx>
> ---
> arch/x86/include/asm/hw_irq.h | 11 +++---
> arch/x86/include/asm/msidef.h | 2 ++
> arch/x86/kernel/apic/io_apic.c | 55 ++++++++++++++++++-----------
> drivers/iommu/amd/iommu.c | 14 --------
> drivers/iommu/hyperv-iommu.c | 31 ----------------
> drivers/iommu/intel/irq_remapping.c | 19 +++-------
> 6 files changed, 46 insertions(+), 86 deletions(-)

Nice :)

> +static void mp_swizzle_msi_dest_bits(struct irq_data *irq_data, void *_entry)
> +{
> + struct msi_msg msg;
> + u32 *entry = _entry;
> +
> + irq_chip_compose_msi_msg(irq_data, &msg);

Duh, for some stupid reason it never occured to me to do it that
way.

Historically the MSI compose function was part of the MSI PCI chip and I
just changed that recently when I reworked the code to make IMS support
possible.

Historical blinders are pretty sticky :(

> + /*
> + * They're in a bit of a random order for historical reasons, but
> + * the IO/APIC is just a device for turning interrupt lines into
> + * MSIs, and various bits of the MSI addr/data are just swizzled
> + * into/from the bits of Redirection Table Entry.
> + */
> + entry[0] &= 0xfffff000;
> + entry[0] |= (msg.data & (MSI_DATA_DELIVERY_MODE_MASK |
> + MSI_DATA_VECTOR_MASK));
> + entry[0] |= (msg.address_lo & MSI_ADDR_DEST_MODE_MASK) << 9;
> +
> + entry[1] &= 0xffff;
> + entry[1] |= (msg.address_lo & MSI_ADDR_DEST_ID_MASK) << 12;
> +}

....

> switch (info->type) {
> case X86_IRQ_ALLOC_TYPE_IOAPIC:
> - /* Setup IOAPIC entry */
> - entry = info->ioapic.entry;
> - info->ioapic.entry = NULL;
> - memset(entry, 0, sizeof(*entry));
> - entry->vector = index;
> - entry->mask = 0;
> - entry->trigger = info->ioapic.trigger;
> - entry->polarity = info->ioapic.polarity;
> - /* Mask level triggered irqs. */
> - if (info->ioapic.trigger)
> - entry->mask = 1;
> - break;
> -

Thanks for cleaning this up!

tglx