Re: [RFC PATCH v2 1/2] vfio/type1: Adopt fast IOTLB flush interface when unmap IOVAs

From: Suravee Suthikulpanit
Date: Wed Jan 17 2018 - 22:25:31 EST


Hi Alex,

On 1/9/18 3:53 AM, Alex Williamson wrote:
On Wed, 27 Dec 2017 04:20:34 -0500
Suravee Suthikulpanit <suravee.suthikulpanit@xxxxxxx> wrote:
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index e30e29a..f000844 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c

... >>
@@ -479,6 +486,40 @@ static long vfio_unpin_pages_remote(struct vfio_dma *dma, dma_addr_t iova,
return unlocked;
}
+/*
+ * Generally, VFIO needs to unpin remote pages after each IOTLB flush.
+ * Therefore, when using IOTLB flush sync interface, VFIO need to keep track
+ * of these regions (currently using a list).
+ *
+ * This value specifies maximum number of regions for each IOTLB flush sync.
+ */
+#define VFIO_IOMMU_TLB_SYNC_MAX 512

Is this an arbitrary value or are there non-obvious considerations for
this value should we want to further tune it in the future?

This is just an arbitrary value for now, which we could try further tuning.
On some dGPUs that I have been using, I have seen max of ~1500 regions within an unmap call.
In most case, I see less than 100 regions in an unmap call. The structure is currently 40 bytes.
So, I figured capping at 512 entry in the list is 20KB is reasonable. Let me know what you think.

....

@@ -887,8 +946,14 @@ static int map_try_harder(struct vfio_domain *domain, dma_addr_t iova,
break;
}
- for (; i < npage && i > 0; i--, iova -= PAGE_SIZE)
- iommu_unmap(domain->domain, iova, PAGE_SIZE);
+ for (; i < npage && i > 0; i--, iova -= PAGE_SIZE) {
+ unmapped = iommu_unmap_fast(domain->domain, iova, PAGE_SIZE);
+ if (WARN_ON(!unmapped))
+ break;
+ iommu_tlb_range_add(domain->domain, iova, unmapped);
+ }
+ if (unmapped)
+ iommu_tlb_sync(domain->domain);

Using unmapped here seems a little sketchy, for instance if we got back
zero on the last call to iommu_unmap_fast() but had other ranges queued
for flush. Do we even need a WARN_ON and break here, are we just
trying to skip adding a zero range? The intent is that we either leave
this function with everything mapped or nothing mapped, so perhaps we
should warn and continue. Assuming a spurious sync is ok, we could
check (i < npage) for the sync condition, the only risk being we had no
mappings at all and therefore no unmaps.

TBH, I wonder if this function is even needed anymore or if the mapping
problem in amd_iommu has since ben fixed.

Actually, I never hit this execution path in my test runs. I could just left this
unchanged and use the slow unmap path to simplify the logic. I'm not aware of
the history of why this logic is needed for AMD IOMMU. Is this a bug in the driver or
the hardware?

Also, I'm not sure why you're gating adding fast flushing to amd_iommu
on vfio making use of it. These can be done independently. Thanks,

Currently, the fast unmap interface is mainly called by VFIO. So, I thought I would
submit the patches together for review. If you would prefer, I can submit the IOMMU part
separately.

Thanks,
Suravee