Re: [PATCH v2 3/8] iommu/dma: Disable get_sgtable for granule > PAGE_SIZE
From: Sven Peter
Date: Thu Sep 02 2021 - 14:19:43 EST
On Wed, Sep 1, 2021, at 23:10, Alyssa Rosenzweig wrote:
> > My biggest issue is that I do not understand how this function is supposed
> > to be used correctly. It would work fine as-is if it only ever gets passed buffers
> > allocated by the coherent API but there's not way to check or guarantee that.
> > There may also be callers making assumptions that no longer hold when
> > iovad->granule > PAGE_SIZE.
> >
> > Regarding your case: I'm not convinced the function is meant to be used there.
> > If I understand it correctly, your code first allocates memory with dma_alloc_coherent
> > (which possibly creates a sgt internally and then maps it with iommu_map_sg),
> > then coerces that back into a sgt with dma_get_sgtable, and then maps that sgt to
> > another iommu domain with dma_map_sg while assuming that the result will be contiguous
> > in IOVA space. It'll work out because dma_alloc_coherent is the very thing
> > meant to allocate pages that can be mapped into kernel and device VA space
> > as a single contiguous block and because both of your IOMMUs are different
> > instances of the same HW block. Anything allocated by dma_alloc_coherent for the
> > first IOMMU will have the right shape that will allow it to be mapped as
> > a single contiguous block for the second IOMMU.
> >
> > What could be done in your case is to instead use the IOMMU API,
> > allocate the pages yourself (while ensuring the sgt your create is made up
> > of blocks with size and physaddr aligned to max(domain_a->granule, domain_b->granule))
> > and then just use iommu_map_sg for both domains which actually comes with the
> > guarantee that the result will be a single contiguous block in IOVA space and
> > doesn't required the sgt roundtrip.
>
> In principle I agree. I am getting the sense this function can't be used
> correctly in general, and yet is the function that's meant to be used.
> If my interpretation of prior LKML discussion holds, the problems are
> far deeper than my code or indeed page size problems...
Right, which makes reasoning about this function and its behavior if the
IOMMU pages size is unexpected very hard for me. I'm not opposed to just
keeping this function as-is when there's a mismatch between PAGE_SIZE and
the IOMMU page size (and it will probably work that way) but I'd like to
be sure that won't introduce unexpected behavior.
>
> If the right way to handle this is with the IOMMU and IOVA APIs, I really wish
> that dance were wrapped up in a safe helper function instead of open
> coding it in every driver that does cross device sharing.
>
> We might even call that helper... hmm... dma_map_sg.... *ducks*
>
There might be another way to do this correctly. I'm likely just a little
bit biased because I've spent the past weeks wrapping my head around the
IOMMU and DMA APIs and when all you have is a hammer everything looks like
a nail.
But dma_map_sg operates at the DMA API level and at that point the dma-ops
for two different devices could be vastly different.
In the worst case one of them could be behind an IOMMU that can easily map
non-contiguous pages while the other one is directly connected to the bus and
can't even access >4G pages without swiotlb support.
It's really only possible to guarantee that it will map N buffers to <= N
DMA-addressable buffers (possibly by using an IOMMU or swiotlb internally) at
that point.
On the IOMMU API level you have much more information available about the actual
hardware and can prepare the buffers in a way that makes both devices happy.
That's why iommu_map_sgtable combined with iovad->granule aligned sgt entries
can actually guarantee to map the entire list to a single contiguous IOVA block.
Sven