Re: [PATCH v4 18/24] iommu/arm-smmu-v3: Introduce master->ats_broken flag

From: Nicolin Chen

Date: Mon Jun 01 2026 - 16:42:06 EST


On Mon, Jun 01, 2026 at 09:32:31AM -0300, Jason Gunthorpe wrote:
> On Fri, May 29, 2026 at 06:27:40PM -0700, Nicolin Chen wrote:
> > On Tue, May 19, 2026 at 09:06:58AM -0300, Jason Gunthorpe wrote:
> > > On Mon, May 18, 2026 at 08:39:01PM -0700, Nicolin Chen wrote:
> > So I've tried INV_TYPE_ATS_BROKEN: during per-domain invalidation,
> > each batch is built from domain->invs so it can carry the "invs";
> > if the batch times out, we can immediately mutate its ATS entries.
> >
> > But I realized a limitation. E.g., if a device attaches to two SVA
> > domains on two SSIDs. An invalidation timing out on one of the SVA
> > domains could mark INV_TYPE_ATS_BROKEN in its own invs, but not in
> > the other SVA domain's invs?
>
> You'd have to mark all the S1's sharing the STE.

That would be a bit convoluted as we would have to go through all
other domains' invs arrays.

A master (that timed out an ATC_INV) might be attached to multiple
domains (RID, SVA1, SVA2, ...). Also, we currently don't have any
per-master reverse-tracking to its attached domains (master_domain
is added to smmu_domain->devices list only for now).

So, two things would be needed on top of what we currently have.

Firstly, we would need another per-master list tracking all the
attached smmu_domains. Maybe reuse master_domain? Let's call this
master->master_domains for now.

Secondly, locking. We have two paths that can trigger an ATC_INV
timeout: __arm_smmu_domain_inv_range() that takes the rwlock read
on the current smmu_domain->invs; arm_smmu_atc_inv_master() that
doesn't take any rwlock. When these two paths walk through the
master->master_domains, we would need to take different rwlocks
for those domains. Also, the __arm_smmu_domain_inv_range() path
should skip the invs on the current master_domain, as the rwlock
is already held.

I wonder what's your opinion about these?

Given all this complexity, I started to wonder if we could have
implemented the invs as an RCU-list than an RCU-array: all IOTLB
tag nodes would be still allocated to add/delete/read locklessly;
all ATS nodes would be fixed in the master structure to add/del/
read with the rwlock. Then, a timeout occurring to either path
can simply mutate the ATS entries on the master directly without
going through the list of domains.

> > So, it seems that master->ats_broken is still a cleaner solution?
>
> I don't want the invs code touching master, that is against the entire
> design.

I think I can understand the idea here: we want the invs design to
be in the common code, so anything that's driver-specific (smmu or
master) shouldn't be touched.

> Maybe a flag in the invs list itself is sufficient.

I think we would have to use INV_TYPE_ATS_BROKEN than a per-invs
flag: e.g., a nesting parent domain will have multiple ATS devices
so it cannot use one flag on its big invs to separate the broken
devices from all other healthy devices.

Nicolin