Re: [PATCH v2 23/24] iommu/arm-smmu: Allow non-strict in pgtable_quirks interface

From: Will Deacon
Date: Tue Aug 03 2021 - 08:35:57 EST


On Tue, Aug 03, 2021 at 01:13:12PM +0100, Robin Murphy wrote:
> On 2021-08-03 11:36, Will Deacon wrote:
> > Overall, I'm really nervous about the concurrency here and think we'd be
> > better off requiring the unbind as we have for the other domain changes.
>
> Sure, the dynamic switch is what makes it ultimately work for Doug's
> use-case (where the unbind isn't viable), but I had every expectation that
> we might need to hold back those two patches for much deeper consideration.
> It's no accident that the first 22 patches can still be usefully applied
> without them!

Oh, the rest of the series looks great which is why I jumped on this bit!

> In all honesty I don't really like this particular patch much, mostly
> because I increasingly dislike IO_PGTABLE_QUIRK_NON_STRICT at all, but since
> the interface was there it made it super easy to prove the concept. I have a
> more significant refactoring of the io-pgtable code sketched out in my mind
> already, it's just going to be more work.

Intriguing... Move the smp_wmb() into the IOVA code instead?

> > With your changes, I think quite a few things can go wrong.
> >
> > * cookie->fq_domain may be observed but iovad->fq could be NULL
>
> Good point, I guess that already technically applies (if iovad->fq sat in a
> write buffer long enough), but it certainly becomes far easier to provoke.
> However a barrier after assigning fq_domain (as mentioned above) paired with
> the control dependency around the queue_iova() call would also fix that,
> right?
>
> > * We can miss the smp_wmb() in the pgtable code but end up deferring the
> > IOVA reclaim
> > * iommu_change_dev_def_domain() only holds the group mutex afaict, so can
> > possibly run concurrently with itself on the same domain?
> > * iommu_dma_init_fq() can flip the domain type back from
> > IOMMU_DOMAIN_DMA_FQ to IOMMU_DOMAIN_DMA on the error path
> > * set_pgtable_quirks() isn't atomic, which probably is ok for now, but the
> > moment we use it anywhere else it's dangerous
>
> In other words, IO_PGTABLE_QUIRK_NON_STRICT is definitely the problem. I'll
> have a hack on that this afternoon, and if it starts to look rabbit-holey
> I'll split this bit off and post v3 of the rest of the series.
>
> If all the io-pgtable and fq behaviour for a given call could be consistent
> based on a single READ_ONCE(cookie->fq_domain) in iommu_dma_unmap(), do you
> see any remaining issues other than the point above?

I'd have to see the patches, and I didn't look exhaustively at the current
stuff. But yes, I think the basic flow needs to be that there is an atomic
flag (i.e. cookie->fq_domain) which indicates which mode is being used
and if you flip that concurrently then you need to guarantee that everybody
is either using the old more or the new mode and not some halfway state in
between.

Will