Re: [PATCH v1 2/2] iommu/arm-smmu-v3: Recover ATC invalidate timeouts

From: Samiullah Khawaja

Date: Fri Mar 06 2026 - 14:35:54 EST


On Fri, Mar 06, 2026 at 09:00:06AM -0400, Jason Gunthorpe wrote:
On Fri, Mar 06, 2026 at 11:22:52AM +0800, Baolu Lu wrote:
I believe this issue is not unique to the arm-smmu-v3 driver. Device ATC
invalidation timeout is a generic challenge across all IOMMU
architectures that support PCI ATS. Would it be feasible to implement a
common 'fencing and recovery' mechanism in the IOMMU core so that all
IOMMU drivers could benefit?

I think yes, for parts, but the driver itself has to do something deep
inside it's invalidation to allow the flush to complete without
exposing the system to memory corruption - meaning it has to block
translated requests before completing the flush

Yes and currently the underlying drivers have software timeouts
(AMD=100millisecond, arm-smmu-v3=1second) defined which could timeout
before the actual ATC invalidation timeout occurs. Do you think maybe
the timeout needs to be propagated to the caller (flush callback) so the
memory/IOVA is not allocated to something else? Or blocking translated
requests for such devices should be enough?

I don't see how that can be made too generalized since we are running
this flush stuff in irq and reclaim contexts, it has to be very small
and targtted without memory allocation or sleeping locks.


Jason