Re: [PATCH V4] iommu/io-pgtable-arm: Optimise non-coherent unmap

From: Jason Gunthorpe
Date: Mon Sep 09 2024 - 11:09:42 EST


On Wed, Sep 04, 2024 at 09:24:34AM -0700, Rob Clark wrote:
> Btw, this seems to be causing iommu faults for me for what (according
> to a sw pgtable walk) should be a valid mapping, indicating
> missing/incomplete tlb invalidation. This is with drm/msm (which
> probably matters, since it implements it's own iommu_flush_ops) on
> x1e80100 (which probably doesn't matter.. but it is an mmu-500 in case
> it does).
>
> I _think_ what is causing this is the change in ordering of
> __arm_lpae_clear_pte() (dma_sync_single_for_device() on the pgtable
> memory) and io_pgtable_tlb_flush_walk(). I'm not entirely sure how
> this patch is supposed to work correctly in the face of other
> concurrent translations (to buffers unrelated to the one being
> unmapped(), because after the io_pgtable_tlb_flush_walk() we can have
> stale data read back into the tlb.

You mean this?

if (!iopte_leaf(pte, lvl, iop->fmt)) {
+ __arm_lpae_clear_pte(&ptep[i], &iop->cfg, 1);
+
/* Also flush any partial walks */
io_pgtable_tlb_flush_walk(iop, iova + i * size, size,
ARM_LPAE_GRANULE(data));
__arm_lpae_free_pgtable(data, lvl + 1, iopte_deref(pte, data));

I would say it should work because
1) The pte is cleared and cache flushed before the iotlb is cleared
by the added __arm_lpae_clear_pte()
2) This is not a 'shared table' since it is fully covered by the
size being unmapped. The caller must ensure there are no
inersecting concurrent map/unmaps.
3) The double zeroing doesn't matter because of #2, no races are
permitted.

Jason