Re: [PATCH v2 4/7] iommu/arm-smmu-v3: Mark ATC invalidate timeouts via lockless bitmap
From: Nicolin Chen
Date: Wed Mar 18 2026 - 15:27:42 EST
On Wed, Mar 18, 2026 at 07:36:20AM +0000, Tian, Kevin wrote:
> > From: Nicolin Chen <nicolinc@xxxxxxxxxx>
> > Sent: Wednesday, March 18, 2026 3:16 AM
> >
> > An ATC invalidation timeout is a fatal error. While the SMMUv3 hardware is
> > aware of the timeout via a GERROR interrupt, the driver thread issuing the
> > commands lacks a direct mechanism to verify whether its specific batch was
> > the cause or not, as polling the CMD_SYNC status doesn't natively return a
> > failure code, making it very difficult to coordinate per-device recovery.
> >
> > Introduce an atc_sync_timeouts bitmap in the cmdq structure to bridge this
> > gap. When the ISR detects an ATC timeout, set the bit corresponding to the
> > physical CMDQ index of the faulting CMD_SYNC command.
> >
>
> It's nice to see the ability of allowing sw to identify the faulting sync command
> upon an ATC timeout! On VT-d it's not feasible when multiple wait descriptors
> (similar to CMD_SYNC) are in-fly... :/
Actually SMMU doesn't know which device is faulting when CMD_SYNC
follows ATC_INV commands for multiple devices. The commit message
in PATCH-7 describes this in the end. So Jason suggested to retry
those ATC_INV commands by bisecting them per-device, which allows
us to pinpoint which device.
Could VT-d do the same?
Nicolin