Re: [RFC PATCH] irqchip/gic-v3-its: enable dynamic MSI-X allocation

From: Jinqian Yang

Date: Wed Jun 24 2026 - 05:30:11 EST




On 2026/6/24 15:07, Marc Zyngier wrote:
On Wed, 24 Jun 2026 03:53:45 +0100,
Jinqian Yang <yangjinqian1@xxxxxxxxxx> wrote:

On ARM64 platforms with GICv3 ITS, VFIO PCI passthrough currently
cannot dynamically allocate MSI-X vectors after MSI-X has been
enabled. When QEMU needs to extend the vector range, it must
disable MSI-X, free all interrupts, then re-enable with a larger
allocation. This creates an interrupt loss window for already-active
vectors.

Consider HNS3 with RoCE: NIC and RDMA share one PCI device and
ITS DeviceID, with MSI-X vectors partitioned as NIC (lower range)
then RoCE (starting at base_vector = num_nic_msi). In VFIO
passthrough, loading hns_roce after hns3 forces QEMU to tear down
all interrupts before re-allocating the larger range. During this
process, NIC interrupts may be lost. Testing confirmed that this
occasionally occurs, causing the network port reset to fail.

Well, that's what you get for not exposing differentiated functions.
Eventually, you face the reality that this is a poor design.


Fair point, though this is not unique to HNS3.. All major NIC+RDMA
vendors share the same PCI function.


ITS_MSI_FLAGS_SUPPORTED lacks MSI_FLAG_PCI_MSIX_ALLOC_DYN, causing
pci_msix_can_alloc_dyn() to return false. VFIO then sets
has_dyn_msix=false and never clears VFIO_IRQ_INFO_NORESIZE for
MSI-X, keeping the old "disable and reallocate" behavior.

The essential prerequisite for enabling this flag is the fix to
msi_prepare() call timing (commit 1396e89e09f0 ("genirq/msi: Move
prepare() call to per-device allocation")): msi_prepare() is
now called once at per-device domain creation with hwsize, so ITS
creates an ITT with sufficient capacity for all MSI-X vectors.
Without this fix, msi_prepare() was called per-allocation with
semi-random nvec, maybe resulting in an ITT too small for dynamic
vector addition.

How is this paragraph relevant? The kernel has had this fix for over a
year, and backporting this series is not something I plan to ever do.


Will remove from commit msg.


With this in place, dynamic MSI-X allocation works correctly:
msi_domain_alloc_irq_at() uses populate_alloc_info() to copy the
pre-prepared alloc_data without re-invoking msi_prepare(), so each
new vector simply gets a LPI entry in the already-allocated ITT,
without affecting existing vectors.

Signed-off-by: Jinqian Yang <yangjinqian1@xxxxxxxxxx>
---
drivers/irqchip/irq-gic-its-msi-parent.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/irqchip/irq-gic-its-msi-parent.c b/drivers/irqchip/irq-gic-its-msi-parent.c
index b9257103a999..b2b9d2068bb1 100644
--- a/drivers/irqchip/irq-gic-its-msi-parent.c
+++ b/drivers/irqchip/irq-gic-its-msi-parent.c
@@ -18,7 +18,8 @@
#define ITS_MSI_FLAGS_SUPPORTED (MSI_GENERIC_FLAGS_MASK | \
MSI_FLAG_PCI_MSIX | \
- MSI_FLAG_MULTI_PCI_MSI)
+ MSI_FLAG_MULTI_PCI_MSI | \
+ MSI_FLAG_PCI_MSIX_ALLOC_DYN)
static int its_translate_frame_address(struct fwnode_handle *msi_node, phys_addr_t *pa)
{

What has this been tested with? In which conditions?


Tested on Hisilicon HIP09 (ARM64, GICv3/GICv4.1) with latest
upstream kernel and QEMU 8.2.

VFIO passthrough of HNS3 NIC to VM: load both hns3 and
hns_roce_hw_v2 drivers, then trigger FLR. Without the flag,
QEMU disables/re-enables MSI-X around FLR, causing occasional
link up failure due to interrupt loss.

Thanks,
Jinqian