Re: [PATCH v1 2/2] iommu/arm-smmu-v3: Recover ATC invalidate timeouts
From: Robin Murphy
Date: Fri Mar 06 2026 - 08:23:27 EST
On 2026-03-05 11:41 pm, Jason Gunthorpe wrote:
On Thu, Mar 05, 2026 at 01:15:45PM -0800, Nicolin Chen wrote:
You mean in arm_smmu_cmdq_issue_cmdlist() that issued the timed
out ATC command?
Yes, it was my off hand thought.
So my test case was to trigger a device fault followed by an ATC
command. But, I found that the ATC command submission returned 0
while only the ISR received:
CMDQ error (cons 0x03000003): ATC invalidate timeout
arm_smmu_debugfs_atc_write: ATC_INV ret=0
It seems difficult to insert a CMDQ_OP_CFGI_STE in the submission
thread?
I didn't look, but I thought the CMDQ stops on the ATC invalidation,
flags the error and the ISR NOP's the failing CMDQ entry and restarts
it to resume the thread? Is that something else?
If so you could insert the STE flush instead of a NOP
Nope, sadly the timeout is asynchronous, and CERROR_ATC_INV_SYNC is only reported on the *next* CMD_SYNC - it can't even tell us which CMD_ATC_INV(s) had a problem. Also there is no NOP; currently the only command rewriting we do is for CERROR_ILL, where we turn the illegal command into a CMD_SYNC.
We couldn't necessarily rely on being able to rewind the hardware CONS pointer from a CMD_SYNC, as by that point we're likely to have observed it and updated llq->cons, such that other threads could move llq->prod forward and fill that space with new commands.
Thanks,
Robin.
Otherwise the arm_smmu_cmdq_issue_cmdlist() can just push another CMD
to the queue and sync, it is obviously in a context that can do that.
Jason