Re: [PATCH v3 3/3] iommu/amd: Invalidate IRT cache for DMA aliases

From: Vasant Hegde

Date: Wed Apr 01 2026 - 03:48:10 EST


Hi Magnus,


On 3/31/2026 6:18 PM, Magnus Kalland wrote:
> Hi Vasant, and thank you for the review.
>
> On Mon, Mar 30, 2026 at 04:48:23PM +0530, Vasant Hegde wrote:
>> There is a possible deadlock. Actually we can remove lock here?
>
>
>> path 1) alloc_irq_table() -> holds iommu_table_lock [A] -> iommu->lock [B]
>> path 2) iommu_flush_irt_and_complete -> holds iommu->lock [B] ->
>> iommu_table_lock [A]
>
> You are correct, there is a possible deadlock. We can avoid it by grabbing
> iommu_table_lock, creating a local copy of the aliases, releasing it, then
> flushing the aliases. This way, there is no nested locking in
> iommu_flush_irt_and_complete.
>
> We notice that the alias table is read in other paths without holding
> iommu_table_lock (amd_iommu_change_top, setup_aliases). If we can do the same,
> then that is of course another way to avoid the deadlock. What do you think?
>
>>> +
>>> + for (devid = 0; devid <= pci_seg->last_bdf; ++devid) {
>>> + if (pci_seg->alias_table[devid] != alias)
>
>> This is heavy hammer. Why not use pci_for_each_dma_alias() like we do in DTE
>> flush path?
>
> Agree this is heavy.
>
> We cannot use pci_for_each_dma_alias. See the reply to our v1:
> https://lore.kernel.org/linux-iommu/26cfa307-6c33-41f9-a7a0-fbf202b38a00@xxxxxxx/

I went back to those reports and reviewed it again. It looks like
pci_get_domain_bus_and_slot() takes spinlock which is causing the lockdep issue.

I think V2 is good w/ some changes to get the pdev.
Instead of pci_get_domain_bus_and_slot(), we should use dev_data (like we do in
other places).



>
> However, we have a v4 ready doing 256 loop iterations instead, since aliases
> are always on the same bus. I think that's a better approach.
> What do you think?

Its a performance sensitive code path. Also during init path, dma aliases are
set properly. I think we should go w/ pci)fir_each_dma_alias -> flush irte path.


I have added below fix on top of your v2 and did some sanity tests. So far it
looks good. I have requested Dheeraj to rerun the tests.

(https://lore.kernel.org/linux-iommu/20260205140059.11857-2-magnus@xxxxxxxxxxxxxx/)


-Vasant


diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 050178cf388f..eb23b9e7bf03 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3118,12 +3118,10 @@ static void iommu_flush_irt_and_complete(struct
amd_iommu *iommu, u16 devid)
{
int ret;
u64 data;
- int domain = iommu->pci_seg->id;
- unsigned int bus = PCI_BUS_NUM(devid);
- unsigned int devfn = devid & 0xff;
unsigned long flags;
struct iommu_cmd cmd;
struct pci_dev *pdev = NULL;
+ struct iommu_dev_data *dev_data = search_dev_data(iommu, devid);

if (iommu->irtcachedis_enabled)
return;
@@ -3131,11 +3129,12 @@ static void iommu_flush_irt_and_complete(struct
amd_iommu *iommu, u16 devid)
data = atomic64_inc_return(&iommu->cmd_sem_val);
build_completion_wait(&cmd, iommu, data);

- pdev = pci_get_domain_bus_and_slot(domain, bus, devfn);
+ if (dev_data && dev_data->dev && dev_is_pci(dev_data->dev))
+ pdev = to_pci_dev(dev_data->dev);
+
raw_spin_lock_irqsave(&iommu->lock, flags);
if (pdev) {
ret = pci_for_each_dma_alias(pdev, iommu_flush_dev_irt, iommu);
- pci_dev_put(pdev);
} else {
ret = iommu_flush_dev_irt(NULL, devid, iommu);
}