Re: [PATCH v1 2/2] iommu/arm-smmu-v3: Recover ATC invalidate timeouts
From: Jason Gunthorpe
Date: Fri Mar 06 2026 - 15:27:07 EST
On Fri, Mar 06, 2026 at 08:22:08PM +0000, Samiullah Khawaja wrote:
> But do you think doing the timeout logic without fencing would be good
> enough?
It is what ARM and AMD do, so I wouldn't object to it.
> Currently VT-d blocks itself, until it gets an Invalidation Timeout
> from HW, and system ends up in a hardlockup since interrupts are
> disabled.
>
> Are you concerned that if fencing is done without an RAS flow, the
> device might not be able to detect the failure (if it really needs ATS
> to work)?
Yes, and then the device is badly locked because nothing will fix the
IOMMU fence.
VFIO might fix it if it is restarted, but other approahces like
rmmod/insmod won't restore the broken device.
So I'd rather see a more complete solution before we add fencing to
the iommu drivers. Minimally userspace doing a rmmod, flr, insmod
should be able to restore the device.
Then auto-FLR through RAS could sit on top of that.
> I am thinking, we can do translated fence and timeout change for VT-d.
> And the device can use existing RAS mechanism to recover itself. This
> way we atleast make sure that caller of flush can reuse the memory/IOVAs
> without UAFs.
Without a larger framework to unfence I think this will get devices
stuck..
Jason