Re: [PATCH v2 4/7] iommu/arm-smmu-v3: Mark ATC invalidate timeouts via lockless bitmap

From: Jason Gunthorpe

Date: Mon Mar 23 2026 - 19:58:14 EST


On Wed, Mar 18, 2026 at 04:23:53PM -0700, Nicolin Chen wrote:

> If the software times out first at 1s, it means the CMDQ is still
> pending on wait for the completion of ATC invalidation. Then, the
> caller sees -ETIMEOUT and tries to bisect the ATC batch or update
> the STE directly, either of which involves CMDQ. But CMDQ has not
> recovered yet.

Yeah, I don't know if the SW timeout flow is really all that RASy here
right now. Without somehow recovering the CMDQ it is pointless to try
to continue after a timeout.

And we are really in trouble if things like normal IOTLB invalidation
start to fail.

I think the right thing is to somehow try to recover the cmdq and then
restart it on the commands that haven't been SYNC'd yet and just keep
trying, maybe with progressively longer timeouts.

Just ignoring the error and continuing doesn't seem safe.

But that's something else again, as long as ATC invalidation reliably
hits the HW timeout first we should be OK to ignore it in this
series..

Jason