Re: [PATCH v2 3/7] iommu: Add iommu_report_device_broken() to quarantine a broken device

From: Shuai Xue

Date: Wed Mar 18 2026 - 07:48:18 EST




On 3/18/26 3:15 AM, Nicolin Chen wrote:
When an IOMMU hardware detects an error due to a faulty device (e.g. an ATS
invalidation timeout), IOMMU drivers may quarantine the device by disabling
specific hardware features or dropping translation capabilities.

However, the core-level states of the faulty device are out of sync, as the
device can be still attached to a translation domain or even potentially be
moved to a new domain that might overwrite the driver-level quarantine.

Given that such an error can be likely an ISR, introduce a broken_work per
iommu_group, and add a helper function to allow driver to report the broken
device, so as to completely quarantine it in the core.

Use the existing pci_dev_reset_iommu_prepare() function to shift the device
to its resetting_domain/blocking_domain. A later pci_dev_reset_iommu_done()
call will clear it and move it out of the quarantine.

Signed-off-by: Nicolin Chen <nicolinc@xxxxxxxxxx>
---
include/linux/iommu.h | 2 ++
drivers/iommu/iommu.c | 59 +++++++++++++++++++++++++++++++++++++++++++
2 files changed, 61 insertions(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 9ba12b2164724..9b5f94e566ff9 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -891,6 +891,8 @@ static inline struct iommu_device *__iommu_get_iommu_dev(struct device *dev)
#define iommu_get_iommu_dev(dev, type, member) \
container_of(__iommu_get_iommu_dev(dev), type, member)
+void iommu_report_device_broken(struct device *dev);
+

This declaration is inside the #ifdef CONFIG_IOMMU_API section, but
there's no corresponding stub in the #else block. While current
callers (arm-smmu-v3) always have CONFIG_IOMMU_API, for API
completeness, please add a stub.
Thanks.
Shuai