Re: [PATCH v1 2/2] iommu/arm-smmu-v3: Recover ATC invalidate timeouts

From: Samiullah Khawaja

Date: Tue Mar 10 2026 - 16:01:52 EST


On Fri, Mar 06, 2026 at 04:26:52PM -0400, Jason Gunthorpe wrote:
On Fri, Mar 06, 2026 at 08:22:08PM +0000, Samiullah Khawaja wrote:

But do you think doing the timeout logic without fencing would be good
enough?

It is what ARM and AMD do, so I wouldn't object to it.

I think without any back pressure to the caller, a device will be able
to fill the invalidation queue with device IOTLB invalidations that get
stuck until the HW timeout occurs.

Currently VT-d blocks itself, until it gets an Invalidation Timeout
from HW, and system ends up in a hardlockup since interrupts are
disabled.

Are you concerned that if fencing is done without an RAS flow, the
device might not be able to detect the failure (if it really needs ATS
to work)?

Yes, and then the device is badly locked because nothing will fix the
IOMMU fence.

VFIO might fix it if it is restarted, but other approahces like
rmmod/insmod won't restore the broken device.

So I'd rather see a more complete solution before we add fencing to
the iommu drivers. Minimally userspace doing a rmmod, flr, insmod
should be able to restore the device.

Then auto-FLR through RAS could sit on top of that.

I am thinking, we can do translated fence and timeout change for VT-d.
And the device can use existing RAS mechanism to recover itself. This
way we atleast make sure that caller of flush can reuse the memory/IOVAs
without UAFs.

Without a larger framework to unfence I think this will get devices
stuck..

Jason