[PATCH rc v8 0/8] iommu: Fix pci_dev_reset_iommu_prepare/done()
From: Nicolin Chen
Date: Fri Apr 24 2026 - 21:16:03 EST
Shuai and Kevin found a few bugs in the pci_dev_reset_iommu_prepare/done()
helpers when used to handle some corner cases:
- Nested callbacks
- Multi-device groups
- WARN_ON/UAF due to concurrent detach
This needs some substantial rework by tracking device reset states on a per
gdev basis. This series includes a few patches addressing them. Most of the
patches are reviewed previously in a single patch v6. As we found more bugs
during the reviews, I split that v6 to smaller patches so each of them will
be cleaner.
This is on Github:
https://github.com/nicolinc/iommufd/commits/fix_iommu_reset-v8
Note that concurrent reset of two DMA alias siblings (sharing the same RID)
might prematurely unblock when one device is done while the other is still
resetting. And it's a bit convoluted to support this case. Given that it's
unclear whether real ATS devices might share RID, for now, add a warning in
the done(). A future work can fix it properly if someone hits it.
Changelog
v8:
* Add Reviewed-by tags
* Fix NULL group->domain in done()
* Tidy goto cleanup when using guard()
* Update patch subject and commit message
* Add warning on premature unblocking in DMA alias cases
* Drop unreachable skip in __iommu_group_set_domain_internal() error path
v7:
https://lore.kernel.org/all/cover.1776551790.git.nicolinc@xxxxxxxxxx/
* Add Reviewed-by tags
* Split v6 into smaller patches
* Add one patch to fix UAF during detach()
* Add one patch to fix unnecessary ATS invalidation
v6:
https://lore.kernel.org/all/20260407194644.171304-1-nicolinc@xxxxxxxxxx/
* Update inline comments and commit message
* Add "max_pasids > 0" condition in both helpers
v5:
https://lore.kernel.org/all/20260404050243.141366-1-nicolinc@xxxxxxxxxx/
* Add 'blocked' to fix iommu_driver_get_domain_for_dev() return.
v4:
https://lore.kernel.org/all/20260324014056.36103-1-nicolinc@xxxxxxxxxx/
* Rename 'reset_cnt' to 'recovery_cnt'
v3:
https://lore.kernel.org/all/20260321223930.10836-1-nicolinc@xxxxxxxxxx/
* Turn prepare()/done() to be per-gdev
* Use reset_depth to track nested re-entries
* Replace group->resetting_domain with a reset_cnt
v2:
https://lore.kernel.org/all/20260319043135.1153534-1-nicolinc@xxxxxxxxxx/
* Fix in the helpers by allowing re-entry
v1:
https://lore.kernel.org/all/20260318220028.1146905-1-nicolinc@xxxxxxxxxx/
Nicolin Chen (8):
iommu: Fix NULL group->domain dereference in
pci_dev_reset_iommu_done()
iommu: Fix kdocs of pci_dev_reset_iommu_done()
iommu: Replace per-group resetting_domain with per-gdev blocked flag
iommu: Fix pasid attach in pci_dev_reset_iommu_prepare/done()
iommu: Fix nested pci_dev_reset_iommu_prepare/done()
iommu: Fix ATS invalidation timeouts during
__iommu_remove_group_pasid()
iommu: Fix WARN_ON in __iommu_group_set_domain_nofail() due to reset
iommu: Warn on premature unblock during DMA aliased sibling reset
drivers/iommu/iommu.c | 223 ++++++++++++++++++++++++++++++++++--------
1 file changed, 181 insertions(+), 42 deletions(-)
--
2.43.0