Re: [RFC PATCH 2/3] iommu/dma-iommu: Support iovad->granule > PAGE_SIZE
From: Robin Murphy
Date: Mon Aug 09 2021 - 13:41:56 EST
On 2021-08-07 12:47, Sven Peter via iommu wrote:
On Fri, Aug 6, 2021, at 20:04, Robin Murphy wrote:
On 2021-08-06 16:55, Sven Peter via iommu wrote:
@@ -1006,6 +1019,31 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
if (dev_is_untrusted(dev))
return iommu_dma_map_sg_swiotlb(dev, sg, nents, dir, attrs);
+ /*
+ * If the IOMMU pagesize is larger than the CPU pagesize we will
+ * very likely run into sgs with a physical address that is not aligned
+ * to an IOMMU page boundary. Fall back to just mapping every entry
+ * independently with __iommu_dma_map then.
Scatterlist segments often don't have nicely aligned ends, which is why
we already align things to IOVA granules in main loop here. I think in
principle we'd just need to move the non-IOVA-aligned part of the
address from sg->page to sg->offset in the temporary transformation for
the rest of the assumptions to hold. I don't blame you for being timid
about touching that, though - it took me 3 tries to get right when I
first wrote it...
I've spent some time with that code now and I think we cannot use it
but have to fall back to iommu_dma_map_sg_swiotlb (even though that swiotlb
part is a lie then):
When we have sg_phys(s) = 0x802e65000 with s->offset = 0 the paddr
is aligned to PAGE_SIZE but has an offset of 0x1000 from something
the IOMMU can map.
Now this would result in s->offset = -0x1000 which is already weird
enough.
Offset is unsigned (and 32bit) so this will actually look like
s->offset = 0xfffff000 then, which isn't much better.
And then sg_phys(s) = 0x902e64000 (instead of 0x802e64000) and
we'll map some random memory in iommu_map_sg_atomic and a little bit later
everything explodes.
Now I could probably adjust the phys addr backwards and make sure offset is
always positive (and possibly larger than PAGE_SIZE) and later restore it
in __finalise_sg then but I feel like that's pushing this a little bit too far.
Yes, that's what I meant. At a quick guess, something like the
completely untested diff below. It really comes down to what we want to
achieve here - if it's just to make this thing work at all, then I'd
favour bolting on the absolute minimum changes, possibly even cheating
by tainting the kernel and saying all bets are off instead of trying to
handle the more involved corners really properly. However if you want to
work towards this being a properly-supported thing, then I think it's
worth generalising the existing assumptions of page alignment from the
beginning.
Robin.
----->8-----
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 58cc7297aab5..73aeaa8bfc73 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -895,8 +895,8 @@ static int __finalise_sg(struct device *dev, struct scatterlist *sg, int nents,
unsigned int s_length = sg_dma_len(s);
unsigned int s_iova_len = s->length;
- s->offset += s_iova_off;
- s->length = s_length;
+ sg_set_page(s, phys_to_page(sg_phys(s)), s_length,
+ s_iova_off & ~PAGE_MASK);
sg_dma_address(s) = DMA_MAPPING_ERROR;
sg_dma_len(s) = 0;
@@ -940,10 +940,9 @@ static void __invalidate_sg(struct scatterlist *sg, int nents)
int i;
for_each_sg(sg, s, nents, i) {
- if (sg_dma_address(s) != DMA_MAPPING_ERROR)
- s->offset += sg_dma_address(s);
if (sg_dma_len(s))
- s->length = sg_dma_len(s);
+ sg_set_page(s, phys_to_page(sg_phys(s)), sg_dma_len(s),
+ sg_dma_address(s) & ~PAGE_MASK);
sg_dma_address(s) = DMA_MAPPING_ERROR;
sg_dma_len(s) = 0;
}
@@ -1019,15 +1018,16 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
* stashing the unaligned parts in the as-yet-unused DMA fields.
*/
for_each_sg(sg, s, nents, i) {
- size_t s_iova_off = iova_offset(iovad, s->offset);
+ phys_addr_t s_phys = sg_phys(s);
+ size_t s_iova_off = iova_offset(iovad, s_phys);
size_t s_length = s->length;
size_t pad_len = (mask - iova_len + 1) & mask;
sg_dma_address(s) = s_iova_off;
sg_dma_len(s) = s_length;
- s->offset -= s_iova_off;
s_length = iova_align(iovad, s_length + s_iova_off);
- s->length = s_length;
+ sg_set_page(s, phys_to_page(s_phys - s_iova_off),
+ s_length, s->offset ^ s_iova_off);
/*
* Due to the alignment of our single IOVA allocation, we can