Re: [PATCH v2 4/7] iommu/arm-smmu-v3: Mark ATC invalidate timeouts via lockless bitmap
From: Nicolin Chen
Date: Mon Mar 23 2026 - 21:22:14 EST
On Mon, Mar 23, 2026 at 08:57:56PM -0300, Jason Gunthorpe wrote:
> On Wed, Mar 18, 2026 at 04:23:53PM -0700, Nicolin Chen wrote:
>
> > If the software times out first at 1s, it means the CMDQ is still
> > pending on wait for the completion of ATC invalidation. Then, the
> > caller sees -ETIMEOUT and tries to bisect the ATC batch or update
> > the STE directly, either of which involves CMDQ. But CMDQ has not
> > recovered yet.
>
> Yeah, I don't know if the SW timeout flow is really all that RASy here
> right now. Without somehow recovering the CMDQ it is pointless to try
> to continue after a timeout.
>
> And we are really in trouble if things like normal IOTLB invalidation
> start to fail.
>
> I think the right thing is to somehow try to recover the cmdq and then
> restart it on the commands that haven't been SYNC'd yet and just keep
> trying, maybe with progressively longer timeouts.
>
> Just ignoring the error and continuing doesn't seem safe.
>
> But that's something else again, as long as ATC invalidation reliably
> hits the HW timeout first we should be OK to ignore it in this
> series..
Yea. I will leave a FIXME inline for now.
Nicolin