Re: [PATCH v4 15/32] vfio: introduce KVM-owned IOMMU type

From: Matthew Rosato
Date: Tue Mar 15 2022 - 09:49:36 EST


On 3/14/22 5:38 PM, Jason Gunthorpe wrote:
On Mon, Mar 14, 2022 at 03:44:34PM -0400, Matthew Rosato wrote:

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 9394aa9444c1..0bec97077d61 100644
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -77,6 +77,7 @@ struct vfio_iommu {
bool nesting;
bool dirty_page_tracking;
bool container_open;
+ bool kvm;
struct list_head emulated_iommu_groups;
};
@@ -2203,7 +2204,12 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
goto out_free_group;
ret = -EIO;
- domain->domain = iommu_domain_alloc(bus);
+
+ if (iommu->kvm)
+ domain->domain = iommu_domain_alloc_type(bus, IOMMU_DOMAIN_KVM);
+ else
+ domain->domain = iommu_domain_alloc(bus);
+
if (!domain->domain)
goto out_free_domain;
@@ -2552,6 +2558,9 @@ static void *vfio_iommu_type1_open(unsigned long arg)
case VFIO_TYPE1v2_IOMMU:
iommu->v2 = true;
break;
+ case VFIO_KVM_IOMMU:
+ iommu->kvm = true;
+ break;

Same remark for this - but more - this is called KVM but it doesn't
accept a kvm FD or any thing else to link the domain to the KVM
in-use.

Right... The name is poor, but with the current design the KVM association comes shortly after. To summarize, with this series, the following relevant steps occur:

1) VFIO_SET_IOMMU: Indicate we wish to use the alternate IOMMU domain
-> At this point, the IOMMU will reject any maps (no KVM, no guest table anchor)
2) KVM ioctl "start":
-> Register the KVM with the IOMMU domain
-> At this point, IOMMU will still reject any maps (no guest table anchor)
3) KVM ioctl "register ioat"
-> Register the guest DMA table head with the IOMMU domain
-> now IOMMU maps are allowed

The rationale for splitting steps 1 and 2 are that VFIO_SET_IOMMU doesn't have a mechanism for specifying more than the type as an arg, no? Otherwise yes, you could specify a kvm fd at this point and it would have some other advantages (e.g. skip notifier). But we still can't use the IOMMU for mapping until step 3.

The rationale for splitting steps 2 and 3 are twofold: 1) during init, we simply don't know where the guest anchor will be when we allocate the domain and 2) because the guest can technically clear and re-initialize their DMA space during the life of the guest, moving the location of the table anchor. We would receive another ioctl operation to unregister the guest table anchor and again reject any map operation until a new table location is provided.