[PATCH rc v7 4/6] iommu: Fix nested pci_dev_reset_iommu_prepare/done()
From: Nicolin Chen
Date: Sat Apr 18 2026 - 19:43:53 EST
Shuai found that cxl_reset_bus_function() calls pci_reset_bus_function()
internally while both are calling pci_dev_reset_iommu_prepare/done().
As pci_dev_reset_iommu_prepare() doesn't support re-entry, the inner call
will trigger a WARN_ON and return -EBUSY, resulting in failing the entire
device reset.
On the other hand, removing the outer calls in the PCI callers is unsafe.
As pointed out by Kevin, device-specific quirks like reset_hinic_vf_dev()
execute custom firmware waits after their inner pcie_flr() completes. If
the IOMMU protection relies solely on the inner reset, the IOMMU will be
unblocked prematurely while the device is still resetting.
Instead, fix this by making pci_dev_reset_iommu_prepare/done() reentrant.
Introduce gdev->reset_depth to handle the re-entries on the same device.
Fixes: c279e83953d9 ("iommu: Introduce pci_dev_reset_iommu_prepare/done()")
Cc: stable@xxxxxxxxxxxxxxx
Reported-by: Shuai Xue <xueshuai@xxxxxxxxxxxxxxxxx>
Closes: https://lore.kernel.org/all/absKsk7qQOwzhpzv@Asurada-Nvidia/
Suggested-by: Kevin Tian <kevin.tian@xxxxxxxxx>
Reviewed-by: Shuai Xue <xueshuai@xxxxxxxxxxxxxxxxx>
Reviewed-by: Jason Gunthorpe <jgg@xxxxxxxxxx>
Reviewed-by: Kevin Tian <kevin.tian@xxxxxxxxx>
Signed-off-by: Nicolin Chen <nicolinc@xxxxxxxxxx>
---
drivers/iommu/iommu.c | 19 +++++++++++++------
1 file changed, 13 insertions(+), 6 deletions(-)
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index e9ffa562b614f..ff181db687bbf 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -82,6 +82,7 @@ struct group_device {
* - Device is undergoing a reset
*/
bool blocked;
+ unsigned int reset_depth;
};
/* Iterate over each struct group_device in a struct iommu_group */
@@ -3997,20 +3998,19 @@ int pci_dev_reset_iommu_prepare(struct pci_dev *pdev)
if (WARN_ON(!gdev))
return -ENODEV;
- /* Re-entry is not allowed (will be fixed in a following patch) */
- if (WARN_ON(gdev->blocked))
- return -EBUSY;
+ if (gdev->reset_depth++)
+ return 0;
ret = __iommu_group_alloc_blocking_domain(group);
if (ret)
- return ret;
+ goto err_depth;
/* Stage RID domain at blocking_domain while retaining group->domain */
if (group->domain != group->blocking_domain) {
ret = __iommu_attach_device(group->blocking_domain, &pdev->dev,
group->domain);
if (ret)
- return ret;
+ goto err_depth;
}
/*
@@ -4037,6 +4037,10 @@ int pci_dev_reset_iommu_prepare(struct pci_dev *pdev)
group->recovery_cnt++;
return ret;
+
+err_depth:
+ gdev->reset_depth--;
+ return ret;
}
EXPORT_SYMBOL_GPL(pci_dev_reset_iommu_prepare);
@@ -4070,7 +4074,10 @@ void pci_dev_reset_iommu_done(struct pci_dev *pdev)
if (WARN_ON(!gdev))
return;
- if (!gdev->blocked)
+ /* Unbalanced done() calls would underflow the counter */
+ if (WARN_ON(gdev->reset_depth == 0))
+ return;
+ if (--gdev->reset_depth)
return;
if (WARN_ON(!group->blocking_domain))
--
2.43.0