Re: [PATCH v1 2/2] iommu/arm-smmu-v3: Recover ATC invalidate timeouts

From: Jason Gunthorpe

Date: Fri Mar 06 2026 - 15:27:07 EST


On Fri, Mar 06, 2026 at 08:22:08PM +0000, Samiullah Khawaja wrote:

> But do you think doing the timeout logic without fencing would be good
> enough?

It is what ARM and AMD do, so I wouldn't object to it.

> Currently VT-d blocks itself, until it gets an Invalidation Timeout
> from HW, and system ends up in a hardlockup since interrupts are
> disabled.
>
> Are you concerned that if fencing is done without an RAS flow, the
> device might not be able to detect the failure (if it really needs ATS
> to work)?

Yes, and then the device is badly locked because nothing will fix the
IOMMU fence.

VFIO might fix it if it is restarted, but other approahces like
rmmod/insmod won't restore the broken device.

So I'd rather see a more complete solution before we add fencing to
the iommu drivers. Minimally userspace doing a rmmod, flr, insmod
should be able to restore the device.

Then auto-FLR through RAS could sit on top of that.

> I am thinking, we can do translated fence and timeout change for VT-d.
> And the device can use existing RAS mechanism to recover itself. This
> way we atleast make sure that caller of flush can reuse the memory/IOVAs
> without UAFs.

Without a larger framework to unfence I think this will get devices
stuck..

Jason