Re: [PATCH v4 18/24] iommu/arm-smmu-v3: Introduce master->ats_broken flag
From: Nicolin Chen
Date: Fri May 29 2026 - 21:32:28 EST
On Tue, May 19, 2026 at 09:06:58AM -0300, Jason Gunthorpe wrote:
> On Mon, May 18, 2026 at 08:39:01PM -0700, Nicolin Chen wrote:
> > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > index cde2ff2dcc49b..638956e2535b4 100644
> > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > @@ -2429,6 +2429,10 @@ static int arm_smmu_atc_inv_master(struct arm_smmu_master *master,
> > struct arm_smmu_cmd cmd;
> > struct arm_smmu_cmdq_batch cmds;
> >
> > + /* Do not issue ATC_INV that will definitely time out */
> > + if (READ_ONCE(master->ats_broken))
> > + return 0;
> > +
> > cmd = arm_smmu_make_cmd_atc_inv_all(0, IOMMU_NO_PASID);
> > arm_smmu_cmdq_batch_init_cmd(master->smmu, &cmds, &cmd);
> > for (i = 0; i < master->num_streams; i++)
> > @@ -2651,12 +2655,18 @@ static void __arm_smmu_domain_inv_range(struct arm_smmu_invs *invs,
> > cur->id));
> > break;
> > case INV_TYPE_ATS:
> > + /* Do not issue ATC_INV that will definitely time out */
> > + if (READ_ONCE(cur->master->ats_broken))
> > + break;
>
> Yuk, this should be a bool in the invs list, not the master.
>
> If the flow is to have the core code always attach a blocked domain before
> reset_done then the invs list will naturally fix itself.
So I've tried INV_TYPE_ATS_BROKEN: during per-domain invalidation,
each batch is built from domain->invs so it can carry the "invs";
if the batch times out, we can immediately mutate its ATS entries.
But I realized a limitation. E.g., if a device attaches to two SVA
domains on two SSIDs. An invalidation timing out on one of the SVA
domains could mark INV_TYPE_ATS_BROKEN in its own invs, but not in
the other SVA domain's invs?
So, it seems that master->ats_broken is still a cleaner solution?
In the same spirit of "the invs list will naturally fix itself":
we could clear master->ats_broken on every attach; if the device
has not recovered yet, arm_smmu_atc_inv_master() in commit() will
time out and set it back to true. Right?
Thanks
Nicolin