Re: [PATCH v2] iommu/amd: Invalidate IRT cache for DMA aliases

From: Srivastava, Dheeraj Kumar

Date: Mon Feb 09 2026 - 05:28:46 EST

Hi Magnus,

On 2/5/2026 7:31 PM, Magnus Kalland wrote:

DMA aliasing causes interrupt remapping table entries (IRTEs) to be shared
between multiple device IDs. See commit 3c124435e8dd
("iommu/amd: Support multiple PCI DMA aliases in IRQ Remapping") for more
information on this. However, the AMD IOMMU driver currently invalidates
IRTE cache entries on a per-device basis whenever an IRTE is updated, not
for each alias.

This approach leaves stale IRTE cache entries when an IRTE is cached under
one DMA alias but later updated and invalidated through a different alias.
In such cases, the original device ID is never invalidated, since it is
programmed via aliasing.

This incoherency bug has been observed when IRTEs are cached for one
Non-Transparent Bridge (NTB) DMA alias, later updated via another.

Fix this by invalidating the interrupt remapping table cache for all DMA
aliases when updating an IRTE.

Link: https://lore.kernel.org/linux-iommu/fwtqfdk3m7qrazj4bfutl4grac46agtxztc3p2lqnejt2wyexu@lztyomxrm3pk/
Signed-off-by: Magnus Kalland <magnus@xxxxxxxxxxxxxx>

---

v2:
- Move the lock acquire before branching
- Call iommu_flush_dev_irt() when pdev is null
- Handle pdev refcount in correct branch

drivers/iommu/amd/iommu.c | 32 +++++++++++++++++++++++++++-----
1 file changed, 27 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 2e1865daa1ce..b5256b28b0c8 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3077,25 +3077,47 @@ const struct iommu_ops amd_iommu_ops = {
static struct irq_chip amd_ir_chip;
static DEFINE_SPINLOCK(iommu_table_lock);
+static int iommu_flush_dev_irt(struct pci_dev *unused, u16 devid, void *data)
+{
+ int ret;
+ struct iommu_cmd cmd;
+ struct amd_iommu *iommu = data;
+
+ build_inv_irt(&cmd, devid);
+ ret = __iommu_queue_command_sync(iommu, &cmd, true);
+ return ret;
+}
+
static void iommu_flush_irt_and_complete(struct amd_iommu *iommu, u16 devid)
{
int ret;
u64 data;
+ int domain = iommu->pci_seg->id;
+ unsigned int bus = PCI_BUS_NUM(devid);
+ unsigned int devfn = devid & 0xff;
unsigned long flags;
- struct iommu_cmd cmd, cmd2;
+ struct iommu_cmd cmd;
+ struct pci_dev *pdev = NULL;
if (iommu->irtcachedis_enabled)
return;
- build_inv_irt(&cmd, devid);
data = atomic64_inc_return(&iommu->cmd_sem_val);
- build_completion_wait(&cmd2, iommu, data);
+ build_completion_wait(&cmd, iommu, data);
+ pdev = pci_get_domain_bus_and_slot(domain, bus, devfn);
raw_spin_lock_irqsave(&iommu->lock, flags);
- ret = __iommu_queue_command_sync(iommu, &cmd, true);
+ if (pdev) {
+ ret = pci_for_each_dma_alias(pdev, iommu_flush_dev_irt, iommu);
+ pci_dev_put(pdev);
+ } else {
+ ret = iommu_flush_dev_irt(NULL, devid, iommu);
+ }
+
if (ret)
goto out;
- ret = __iommu_queue_command_sync(iommu, &cmd2, false);
+
+ ret = __iommu_queue_command_sync(iommu, &cmd, false);
if (ret)
goto out;
wait_on_sem(iommu, data);

I tested the patch with lockdep (CONFIG_PROVE_LOCKING=y) enabled and observed the following lockdep warnings in the kernel logs.

[ 7.215360] kernel: =============================
[ 7.215360] kernel: [ BUG: Invalid wait context ]
[ 7.215360] kernel: 6.19.0-rc8-3e36d27b34eb-1770495763816 #1 Not tainted
[ 7.215360] kernel: -----------------------------
[ 7.215360] kernel: swapper/0/1 is trying to lock:
[ 7.215360] kernel: ff4a3b3365f62368 (&k->list_lock){+.+.}-{3:3}, at: bus_to_subsys+0x28/0x90
[ 7.215360] kernel: other info that might help us debug this:
[ 7.215360] kernel: context-{5:5}
[ 7.215360] kernel: 2 locks held by swapper/0/1:
[ 7.215360] kernel: #0: ff4a3ad400055650 (&desc->request_mutex){+.+.}-{4:4}, at: __setup_irq+0xac/0x770
[ 7.215360] kernel: #1: ff4a3ad4000554c0 (&irq_desc_lock_class){-...}-{2:2}, at: __setup_irq+0xe7/0x770
[ 7.215360] kernel: stack backtrace:
[ 7.215360] kernel: CPU: 61 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.19.0-rc8-3e36d27b34eb-1770495763816 #1 PREEMPT(voluntary)
[ 7.215360] kernel: Hardware name: AMD Corporation Titanite_4G/Titanite_4G, BIOS RTI100CC 03/28/2024
[ 7.215360] kernel: Call Trace:
[ 7.215360] kernel: <TASK>
[ 7.215360] kernel: dump_stack_lvl+0x78/0xe0
[ 7.215360] kernel: __lock_acquire+0x836/0xbe0
[ 7.215360] kernel: lock_acquire+0xc7/0x2c0
[ 7.215360] kernel: ? bus_to_subsys+0x28/0x90
[ 7.215360] kernel: ? srso_alias_return_thunk+0x5/0xfbef5
[ 7.215360] kernel: ? validate_chain+0x261/0x6e0
[ 7.215360] kernel: ? __pfx_match_pci_dev_by_id+0x10/0x10
[ 7.215360] kernel: _raw_spin_lock+0x34/0x80
[ 7.215360] kernel: ? bus_to_subsys+0x28/0x90
[ 7.215360] kernel: bus_to_subsys+0x28/0x90
[ 7.215360] kernel: bus_find_device+0x30/0xd0
[ 7.215360] kernel: ? lock_acquire+0xc7/0x2c0
[ 7.215360] kernel: pci_get_domain_bus_and_slot+0x7d/0x100
[ 7.215360] kernel: iommu_flush_irt_and_complete+0xaa/0x190
[ 7.215360] kernel: ? srso_alias_return_thunk+0x5/0xfbef5
[ 7.215360] kernel: ? srso_alias_return_thunk+0x5/0xfbef5
[ 7.215360] kernel: ? __modify_irte_ga.isra.0+0x5f/0x80
[ 7.215360] kernel: irq_remapping_activate+0x43/0x80
[ 7.215360] kernel: __irq_domain_activate_irq+0x53/0x90
[ 7.215360] kernel: __irq_domain_activate_irq+0x32/0x90
[ 7.215360] kernel: irq_domain_activate_irq+0x2d/0x50
[ 7.215360] kernel: __setup_irq+0x339/0x770
[ 7.215360] kernel: request_threaded_irq+0xe5/0x190
[ 7.215360] kernel: ? __pfx_acpi_irq+0x10/0x10
[ 7.215360] kernel: ? __pfx_acpi_ev_sci_xrupt_handler+0x10/0x10
[ 7.215360] kernel: acpi_os_install_interrupt_handler+0xaf/0x100
[ 7.215360] kernel: ? __pfx_acpi_init+0x10/0x10
[ 7.215360] kernel: acpi_ev_install_xrupt_handlers+0x22/0x90
[ 7.215360] kernel: ? __pfx_acpi_init+0x10/0x10
[ 7.215360] kernel: acpi_bus_init+0x3a/0x460
[ 7.215360] kernel: ? acpi_ut_release_mutex+0x4a/0x90
[ 7.215360] kernel: ? srso_alias_return_thunk+0x5/0xfbef5
[ 7.215360] kernel: ? acpi_install_address_space_handler_internal.part.0+0x64/0x90
[ 7.215360] kernel: ? __pfx_acpi_init+0x10/0x10
[ 7.215360] kernel: ? __pfx_acpi_init+0x10/0x10
[ 7.215360] kernel: acpi_init+0x5d/0x130
[ 7.215360] kernel: ? __pfx_scan_for_dmi_ipmi+0x10/0x10
[ 7.215360] kernel: ? __pfx_acpi_init+0x10/0x10
[ 7.215360] kernel: do_one_initcall+0x5c/0x370
[ 7.215360] kernel: do_initcalls+0xdb/0x190
[ 7.215360] kernel: kernel_init_freeable+0x2d1/0x420
[ 7.215360] kernel: ? __pfx_kernel_init+0x10/0x10
[ 7.215360] kernel: kernel_init+0x1a/0x1c0
[ 7.215360] kernel: ret_from_fork+0x25a/0x280
[ 7.215360] kernel: ? __pfx_kernel_init+0x10/0x10
[ 7.215360] kernel: ret_from_fork_asm+0x1a/0x30
[ 7.215360] kernel: </TASK>

From the warning trace, it appears that __setup_irq() is already holding a raw_spinlock (context 2).

Meanwhile, pci_get_domain_bus_and_slot() attempts to acquire a regular spinlock (context 3). This triggers the kernel warning.

Thanks
Dheeraj