Re: [PATCH v1 2/2] iommu/arm-smmu-v3: Recover ATC invalidate timeouts
From: Pranjal Shrivastava
Date: Tue Mar 10 2026 - 15:41:09 EST
On Fri, Mar 06, 2026 at 09:02:02AM -0400, Jason Gunthorpe wrote:
> On Thu, Mar 05, 2026 at 09:06:17PM -0800, Nicolin Chen wrote:
> > On Thu, Mar 05, 2026 at 09:33:47PM -0400, Jason Gunthorpe wrote:
> > > On Thu, Mar 05, 2026 at 05:29:22PM -0800, Nicolin Chen wrote:
> > >
> > > > But arm_smmu_cmdq_issue_cmdlist() doesn't know when to push another
> > > > CMD. In my case where ATC_INV irq occurs, the return value from the
> > > > arm_smmu_cmdq_poll_until_sync() in the Step 5 is 0, and prods/cons
> > > > are also matched. Actually, at this point that NOP ISR has already
> > > > finished.
> > >
> > > Yes, you'd need a sneaky way to convay the error from the ISR to the
> > > cmdlist code that didn't harm performance. Maybe we could come up with
> > > something, but if it works replacing the NOP with flush sounds fairly
> > > appealing - though can you do a single WORD edit to the STE that will
> > > block translated requests? Zero EATS?
> >
> > Yea. I can give that a try.
>
> This also really needs to go after the invalidation changes because it
> is feasible to also edit the lockless RCU invalidation list from the
> ISR and disable the ATC for the failed device too.
>
> > > Also, will the SMMU start spamming with blocked translation events or
> > > something that will need suppression too?
> >
> > CD.R=0 can suppress fault records, but we would need to override
> > that in every CD of the device.
>
> That's too much to do from ISR, but maybe we can do it from a WQ..
>
(Skimming through these, apologies if I'm losing context), shouldn't we
do all that (marking it as an inv STE / abort STE, suppressing the
faults) in the worker instead of trying to reset/recover the device?
> Jason
Thanks
Praan