Re: MSIs not freed in GICv3 ITS driver
From: Qiang Yu
Date: Tue Mar 03 2026 - 00:24:19 EST
On Thu, Feb 26, 2026 at 01:39:35PM +0000, Marc Zyngier wrote:
> On Wed, 25 Feb 2026 09:34:41 +0000,
> Qiang Yu <qiang.yu@xxxxxxxxxxxxxxxx> wrote:
> >
> > On Thu, Feb 19, 2026 at 04:54:29PM +0000, Marc Zyngier wrote:
> > > On Fri, 16 Jan 2026 15:03:33 +0000,
> > > Manivannan Sadhasivam <mani@xxxxxxxxxx> wrote:
> > > >
> > > > Hi Marc,
> > > >
> > > > Looks like this has fallen through the cracks and my colleage internally
> > > > reported a warning during the removal of a PCI driver and it seems to be related
> > > > to the issue we were discussing in this thread:
> > > >
> > > > [ 54.727284] WARNING: drivers/irqchip/irq-gic-v3-its.c:3639 at its_msi_teardown+0x11c/0x13c, CPU#4: kworker/u73:1/115
> > > > [ 54.738366] Modules linked in: mhi_pci_generic mhi nvme_core usb_f_fs libcomposite sm3_ce nvmem_qcom_spmi_sdam qcom_pon rtc_pm8xxx qcom_spmi_temp_alarm qcom_stats dispcc_glymur gpi llcc_qcom phy_qcom_qmp_pcie qcom_cpucp_mbox qcom_wdt socinfo
> > > > [ 54.760588] CPU: 4 UID: 0 PID: 115 Comm: kworker/u73:1 Tainted: G W 6.18.0-next-20251210-14099-gc20082c23661-dirty #2 PREEMPT
> > > > [ 54.774067] Tainted: [W]=WARN
> > > > [ 54.777412] Hardware name: Qualcomm MTP/Qualcomm Test Device, BIOS 7.0.251121.BOOT.OSSUEFI.3.1-00008-GLYMUR-1 11/21/2025
> > > > [ 54.788849] Workqueue: async async_run_entry_fn
> > > > [ 54.793791] pstate: 21400009 (nzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
> > > > [ 54.801230] pc : its_msi_teardown+0x11c/0x13c
> > > > [ 54.805997] lr : its_msi_teardown+0x54/0x13c
> > > > [ 54.810675] sp : ffff8000837cb710
> > > > [ 54.814373] x29: ffff8000837cb710 x28: ffff00080190e410 x27: ffff0008085ba390
> > > > [ 54.821985] x26: ffff000808629bf0 x25: 0000000000000000 x24: 0000000000000066
> > > > [ 54.829602] x23: 0000000000000007 x22: 0000000000000020 x21: ffff000800059608
> > > > [ 54.837209] x20: ffff000800059607 x19: ffff000800a4a300 x18: 00000000ffffffff
> > > > [ 54.844819] x17: ffff00080ec65400 x16: ffff00080ec65200 x15: ffff00080ec65000
> > > > [ 54.852429] x14: 0000000000000004 x13: ffff0008000b8810 x12: 0000000000000000
> > > > [ 54.860046] x11: ffff0008007798e8 x10: 0000000000000002 x9 : 0000000000000001
> > > > [ 54.867661] x8 : ffff0008007796f8 x7 : 000000000000001f x6 : ffff8000837cb640
> > > > [ 54.875277] x5 : ffff000801918f40 x4 : 0000000000000007 x3 : 0000000000000000
> > > > [ 54.882891] x2 : ffff000800a037c0 x1 : 0000000000000020 x0 : 0000000000000007
> > > > [ 54.890509] Call trace:
> > > > [ 54.893320] its_msi_teardown+0x11c/0x13c (P)
> > > > [ 54.898082] its_msi_teardown+0x34/0x44
> > > > [ 54.902316] msi_remove_device_irq_domain+0x70/0x114
> > > > [ 54.907701] msi_device_data_release+0x20/0x64
> > > > [ 54.912551] devres_release_all+0xa4/0x104
> > >
> > > That's nowhere near enough information for me to do anything about it.
> > >
> > > Unless you describe exactly what device this is, its allocation
> > > requirements, the topology of the system and finally reproduce it on a
> > > vanilla kernel and not something that I have no access to, I can't do
> > > much for you.
> >
> > Hi Marc,
> >
> > Thanks for the feedback. I can reproduce this issue with latest linux-next
> > tag next-20260224.
>
> Please don't test on -next. Pick the latest tag from Linus. As far as
> I am concerned, -next bears no relevance whatsoever.
Reproed same issue on latest tag from Linus.
[ 922.657743] WARNING: drivers/irqchip/irq-gic-v3-its.c:3642 at its_msi_teardown+0x11c/0x13c, CPU#0: rmmod/490
[ 922.815187] CPU: 0 UID: 0 PID: 490 Comm: rmmod Tainted: G S 7.0.0-rc2-00005-gaf4e9ef3d784 #1 PREEMPT
[ 922.826202] Tainted: [S]=CPU_OUT_OF_SPEC
[ 922.830254] Hardware name: Qualcomm Technologies, Inc. SM8550 HDK (DT)
[ 922.836980] pstate: 23400005 (nzCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
[ 922.844158] pc : its_msi_teardown+0x11c/0x13c
[ 922.848664] lr : its_msi_teardown+0x54/0x13c
[ 922.853075] sp : ffff800080a9bb40
[ 922.856497] x29: ffff800080a9bb40 x28: ffff00081055dd00 x27: 0000000000000000
[ 922.863855] x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000
[ 922.871215] x23: 0000000000000007 x22: 0000000000000020 x21: ffff000800060c08
[ 922.878575] x20: ffff000800060c07 x19: ffff000800968280 x18: 00000000ffffffff
[ 922.885931] x17: 00000000000000e3 x16: 00000000000000e2 x15: 00000000000000e1
[ 922.893289] x14: 0000000000000004 x13: ffff000800288210 x12: 0000000000000000
[ 922.900646] x11: ffff0008013c7ce0 x10: 0000000000000002 x9 : 0000000000000001
[ 922.908006] x8 : ffff0008013c7b78 x7 : 000000000000001f x6 : ffff800080a9ba70
[ 922.915364] x5 : 000000000000003c x4 : 0000000000000007 x3 : 0000000000000000
[ 922.922722] x2 : ffff00081072e840 x1 : 0000000000000020 x0 : 0000000000000007
[ 922.930082] Call trace:
[ 922.932624] its_msi_teardown+0x11c/0x13c (P)
[ 922.937133] its_msi_teardown+0x34/0x44
[ 922.941099] msi_remove_device_irq_domain+0x70/0x114
[ 922.946226] msi_device_data_release+0x20/0x64
[ 922.950824] devres_release_all+0xa4/0x104
[ 922.955070] device_unbind_cleanup+0x18/0x84
[ 922.959484] device_release_driver_internal+0x1f4/0x230
[ 922.964878] driver_detach+0x50/0x98
[ 922.968578] bus_remove_driver+0x6c/0xbc
[ 922.972641] driver_unregister+0x30/0x60
[ 922.976697] pci_unregister_driver+0x24/0x9c
[ 922.981110] mhi_pci_driver_exit+0x18/0xe0c [mhi_pci_generic]
[ 922.987051] __arm64_sys_delete_module+0x1b8/0x2a4
[ 922.992007] invoke_syscall+0x48/0x110
[ 922.995896] el0_svc_common.constprop.0+0x40/0xe0
[ 923.000755] do_el0_svc+0x1c/0x28
[ 923.004197] el0_svc+0x34/0x10c
[ 923.007455] el0t_64_sync_handler+0xa0/0xe4
[ 923.011773] el0t_64_sync+0x198/0x19c
[ 923.015570] ---[ end trace 0000000000000000 ]---
>
> >
> > The host is Glymur (Qualcomm compute platform) with an SDX75 modem
> > connected via PCIe. The SDX75 driver requests 7 MSI IRQs, and the warning
> > triggers during driver removal.
> >
> > I think this is actually a common problem with how we handle
> > MSI allocation vs freeing. Here's what I'm seeing:
> >
> > When allocating, irq_domain_alloc_irqs_hierarchy() makes one call to
> > domain->ops->alloc() with nr_irqs=7. The MSI controller (ITS in this case
> > but DWC-MSI has similar behavior) finds a power-of-2 bits in its bitmap
> > region, so it allocates 8 contiguous bits to satisfy the 7 IRQ request.
>
> Well, it's not like the ITS has a choice. Given that the ITT size is
> expressed in a number of bits, you get the choice between a power of
> two or absolutely nothing.
>
> I'm not going to comment on the DWC stuff, as it has been bitrotting
> for the best part of two decades.
Okay, so does each pci driver have to request a power-of-2 number of MSI
IRQs?
>
> >
> > But when freeing, irq_domain_free_irqs_hierarchy() loops and calls
> > domain->ops->free() seven times, each with nr_irqs=1. So we end up freeing
> > 7 individual bits instead of the original 8 bits that was allocated.
> >
> > This allocation/free mismatch seems to corrupt the bitmap tracking, which
> > is what triggers the warning in its_msi_teardown().
> >
> > I suspect this would happen with any PCIe device that requests a
> > non-power-of-2 number of MSI IRQs on systems using ITS or DWC-MSI.
>
> Is this device doing Multi-MSI or MSI-X? Please post an 'lspci -vv' so
> that we know what we are up against.
>
This device only supports MSI capability (not MSI-X). Below is the
relevant lspci output:
0001:01:00.0 Unassigned class [ff00]: Qualcomm Device 0309
Capabilities: [50] MSI: Enable+ Count=8/32 Maskable+ 64bit+
Address: 00000000fffff040 Data: 0000
Masking: ffffff80 Pending: 00000000
- Qiang Yu