RE: [RFC] /dev/ioasid uAPI proposal
From: Tian, Kevin
Date: Fri Jun 04 2021 - 04:38:34 EST
> From: Alex Williamson <alex.williamson@xxxxxxxxxx>
> Sent: Friday, June 4, 2021 5:44 AM
>
> > Based on that observation we can say as soon as the user wants to use
> > an IOMMU that does not support DMA_PTE_SNP in the guest we can still
> > share the IO page table with IOMMUs that do support DMA_PTE_SNP.
page table sharing between incompatible IOMMUs is not a critical
thing. I prefer to disallowing sharing in such case as the starting point,
i.e. the user needs to create separate IOASIDs for such devices.
>
> If your goal is to prioritize IO page table sharing, sure. But because
> we cannot atomically transition from one to the other, each device is
> stuck with the pages tables it has, so the history of the VM becomes a
> factor in the performance characteristics.
>
> For example if device {A} is backed by an IOMMU capable of blocking
> no-snoop and device {B} is backed by an IOMMU which cannot block
> no-snoop, then booting VM1 with {A,B} and later removing device {B}
> would result in ongoing wbinvd emulation versus a VM2 only booted with
> {A}.
>
> Type1 would use separate IO page tables (domains/ioasids) for these such
> that VM1 and VM2 have the same characteristics at the end.
>
> Does this become user defined policy in the IOASID model? There's
> quite a mess of exposing sufficient GET_INFO for an IOASID for the user
> to know such properties of the IOMMU, plus maybe we need mapping flags
> equivalent to IOMMU_CACHE exposed to the user, preventing sharing an
> IOASID that could generate IOMMU faults, etc.
IOMMU_CACHE is a fixed attribute given an IOMMU. So it's better to
convey this info to userspace via GET_INFO for a device_label, before
creating any IOASID. But overall I agree that careful thinking is required
about how to organize those info reporting (per-fd, per-device, per-ioasid)
to userspace.
>
> > > > It doesn't solve the problem to connect kvm to AP and kvmgt though
> > >
> > > It does not, we'll probably need a vfio ioctl to gratuitously announce
> > > the KVM fd to each device. I think some devices might currently fail
> > > their open callback if that linkage isn't already available though, so
> > > it's not clear when that should happen, ie. it can't currently be a
> > > VFIO_DEVICE ioctl as getting the device fd requires an open, but this
> > > proposal requires some availability of the vfio device fd without any
> > > setup, so presumably that won't yet call the driver open callback.
> > > Maybe that's part of the attach phase now... I'm not sure, it's not
> > > clear when the vfio device uAPI starts being available in the process
> > > of setting up the ioasid. Thanks,
> >
> > At a certain point we maybe just have to stick to backward compat, I
> > think. Though it is useful to think about green field alternates to
> > try to guide the backward compat design..
>
> I think more to drive the replacement design; if we can't figure out
> how to do something other than backwards compatibility trickery in the
> kernel, it's probably going to bite us. Thanks,
>
I'm a bit lost on the desired flow in your minds. Here is one flow based
on my understanding of this discussion. Please comment whether it
matches your thinking:
0) ioasid_fd is created and registered to KVM via KVM_ADD_IOASID_FD;
1) Qemu binds dev1 to ioasid_fd;
2) Qemu calls IOASID_GET_DEV_INFO for dev1. This will carry IOMMU_
CACHE info i.e. whether underlying IOMMU can enforce snoop;
3) Qemu plans to create a gpa_ioasid, and attach dev1 to it. Here Qemu
needs to figure out whether dev1 wants to do no-snoop. This might
be based a fixed vendor/class list or specified by user;
4) gpa_ioasid = ioctl(ioasid_fd, IOASID_ALLOC); At this point a 'snoop'
flag is specified to decide the page table format, which is supposed
to match dev1;
5) Qemu attaches dev1 to gpa_ioasid via VFIO_ATTACH_IOASID. At this
point, specify snoop/no-snoop again. If not supported by related
iommu or different from what gpa_ioasid has, attach fails.
6) call KVM to update the snoop requirement via KVM_UPADTE_IOASID_FD.
this triggers ioasidfd_for_each_ioasid();
later when dev2 is attached to gpa_ioasid, same flow is followed. This
implies that KVM_UPDATE_IOASID_FD is called only when new IOASID is
created or existing IOASID is destroyed, because all devices under an
IOASID should have the same snoop requirement.
Thanks
Kevin