Re: [PATCH v3 3/3] iommu/amd: Invalidate IRT cache for DMA aliases

From: Vasant Hegde

Date: Mon Mar 30 2026 - 07:25:03 EST


Hi Magnus,


On 2/26/2026 1:53 AM, Magnus Kalland wrote:
> IRTEs may be shared between multiple device IDs when PCIe DMA
> aliasing is in use. The AMD IOMMU driver currently invalidates
> the interrupt remapping table cache only for the device ID used
> to update the IRTE.
>
> If the same IRTE is cached under a different DMA alias, this
> leaves stale cache entries that are never invalidated.
>
> Iterate over all device IDs sharing the same DMA alias and
> invalidate the IRT cache for each of them when an IRTE is updated.
>
> Co-developed-by: Lars B. Kristiansen <larsk@xxxxxxxxxxxxxx>
> Signed-off-by: Lars B. Kristiansen <larsk@xxxxxxxxxxxxxx>
> Co-developed-by: Jonas Markussen <jonas@xxxxxxxxxxxxxx>
> Signed-off-by: Jonas Markussen <jonas@xxxxxxxxxxxxxx>
> Co-developed-by: Tore H. Larsen <torel@xxxxxxxxx>
> Signed-off-by: Tore H. Larsen <torel@xxxxxxxxx>
> Signed-off-by: Magnus Kalland <magnus@xxxxxxxxxxxxxx>
> Link: https://lore.kernel.org/linux-iommu/26cfa307-6c33-41f9-a7a0-fbf202b38a00@xxxxxxx/
>
> ---
> drivers/iommu/amd/iommu.c | 39 +++++++++++++++++++++++++++++++++++++--
> 1 file changed, 37 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index 5dec3502c8b3..d9a91d1a083e 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -3166,6 +3166,31 @@ const struct iommu_ops amd_iommu_ops = {
> static struct irq_chip amd_ir_chip;
> static DEFINE_RAW_SPINLOCK(iommu_table_lock);
>
> +static int iommu_flush_irt_for_aliases(struct amd_iommu *iommu,
> + u16 alias)
> +{
> + struct amd_iommu_pci_seg *pci_seg = iommu->pci_seg;
> + struct iommu_cmd cmd;
> + unsigned long flags;
> + u32 devid;
> + int ret = 0;
> +
> + raw_spin_lock_irqsave(&iommu_table_lock, flags);

There is a possible deadlock. Actually we can remove lock here?


path 1) alloc_irq_table() -> holds iommu_table_lock [A] -> iommu->lock [B]
path 2) iommu_flush_irt_and_complete -> holds iommu->lock [B] ->
iommu_table_lock [A]


> +
> + for (devid = 0; devid <= pci_seg->last_bdf; ++devid) {
> + if (pci_seg->alias_table[devid] != alias)

This is heavy hammer. Why not use pci_for_each_dma_alias() like we do in DTE
flush path?


-Vasant

> + continue;
> +
> + build_inv_irt(&cmd, devid);
> + ret = __iommu_queue_command_sync(iommu, &cmd, true);
> + if (ret)
> + goto out;
> + }
> +
> +out:
> + raw_spin_unlock_irqrestore(&iommu_table_lock, flags);
> + return ret;
> +}
>
> static void iommu_flush_irt_and_complete(struct amd_iommu *iommu, u16 devid)
> {
> @@ -3173,19 +3198,29 @@ static void iommu_flush_irt_and_complete(struct amd_iommu *iommu, u16 devid)
> u64 data;
> unsigned long flags;
> struct iommu_cmd cmd, cmd2;
> + u16 alias;
>
> if (iommu->irtcachedis_enabled)
> return;
>
> - build_inv_irt(&cmd, devid);
> + raw_spin_lock_irqsave(&iommu_table_lock, flags);
> + alias = iommu->pci_seg->alias_table[devid];
> + raw_spin_unlock_irqrestore(&iommu_table_lock, flags);
>
> raw_spin_lock_irqsave(&iommu->lock, flags);
> data = get_cmdsem_val(iommu);
> build_completion_wait(&cmd2, iommu, data);
>
> - ret = __iommu_queue_command_sync(iommu, &cmd, true);
> + if (alias == devid) {
> + build_inv_irt(&cmd, devid);
> + ret = __iommu_queue_command_sync(iommu, &cmd, true);
> + } else {
> + ret = iommu_flush_irt_for_aliases(iommu, alias);
> + }
> +
> if (ret)
> goto out_err;
> +
> ret = __iommu_queue_command_sync(iommu, &cmd2, false);
> if (ret)
> goto out_err;