Re: [PATCH v1 2/2] iommu/arm-smmu-v3: Recover ATC invalidate timeouts

From: Baolu Lu

Date: Thu Mar 05 2026 - 22:23:52 EST


On 3/5/26 23:39, Jason Gunthorpe wrote:
On Wed, Mar 04, 2026 at 09:21:42PM -0800, Nicolin Chen wrote:
+ /*
+ * ATC timeout indicates the device has stopped responding to coherence
+ * protocol requests. The only safe recovery is a reset to flush stale
+ * cached translations. Note that pci_reset_function() internally calls
+ * pci_dev_reset_iommu_prepare/done() as well and ensures to block ATS
+ * if PCI-level reset fails.
+ */
+ if (!pci_reset_function(pdev)) {
+ /*
+ * If reset succeeds, set BME back. Otherwise, fence the system
+ * from a faulty device, in which case user will have to replug
+ * the device to invoke pci_set_master().
+ */
+ pci_dev_lock(pdev);
+ pci_set_master(pdev);
+ pci_dev_unlock(pdev);
+ }
I thought we talked about this, the iommu driver cannot just blindly
issue a reset like this, the reset has to come from the actual device
driver through the AERish mechanism. Otherwise the driver RAS is going
to explode.

The smmu driver should immediately block the STE (reject translated
requests) to protect the system before resuming whatever command
submissio n has encountered the error.

You could delegate the STE change to the interrupted command
submission to avoid doing it from a ISR, that makes alot of sense
because the submission thread is already operating a cmdq so it could
stick in a STE invalidation command, possibly even in place of the
failed ATC command.

I think I'd break this up into smaller steps, just focus on this STE
mechanism at start and have any future attach callback fix the STE.

Then we can talk about how to properly trigger the PCI RAS flow and so
on.

I believe this issue is not unique to the arm-smmu-v3 driver. Device ATC
invalidation timeout is a generic challenge across all IOMMU
architectures that support PCI ATS. Would it be feasible to implement a
common 'fencing and recovery' mechanism in the IOMMU core so that all
IOMMU drivers could benefit?

Thanks,
baolu