Re: [PATCH v1 2/2] iommu/arm-smmu-v3: Recover ATC invalidate timeouts
From: Jason Gunthorpe
Date: Fri Mar 06 2026 - 14:44:22 EST
On Fri, Mar 06, 2026 at 07:35:19PM +0000, Samiullah Khawaja wrote:
> On Fri, Mar 06, 2026 at 09:00:06AM -0400, Jason Gunthorpe wrote:
> > On Fri, Mar 06, 2026 at 11:22:52AM +0800, Baolu Lu wrote:
> > > I believe this issue is not unique to the arm-smmu-v3 driver. Device ATC
> > > invalidation timeout is a generic challenge across all IOMMU
> > > architectures that support PCI ATS. Would it be feasible to implement a
> > > common 'fencing and recovery' mechanism in the IOMMU core so that all
> > > IOMMU drivers could benefit?
> >
> > I think yes, for parts, but the driver itself has to do something deep
> > inside it's invalidation to allow the flush to complete without
> > exposing the system to memory corruption - meaning it has to block
> > translated requests before completing the flush
>
> Yes and currently the underlying drivers have software timeouts
> (AMD=100millisecond, arm-smmu-v3=1second) defined which could timeout
> before the actual ATC invalidation timeout occurs. Do you think maybe
> the timeout needs to be propagated to the caller (flush callback) so the
> memory/IOVA is not allocated to something else?
No, definitely not, that's basically impossible, so many callers just
can't handle such an idea, and you can't ever fully recover from such
a thing.
> Or blocking translated requests for such devices should be enough?
Yes, we have to fence the hardware and then allow the existing SW
stack to continue without any fear of UAF from the broken HW.
Fencing the HW means using the IOMMU to block translated requests.
Jason