Re: [PATCH v3 24/30] vfio-pci/zdev: wire up group notifier

From: Jason Gunthorpe
Date: Thu Feb 10 2022 - 18:45:15 EST


On Thu, Feb 10, 2022 at 01:59:35PM -0500, Matthew Rosato wrote:

> OK, looking into this, thanks for the pointer... Sounds to me like we then
> want to make the determination upfront and then ensure the right iommu
> domain ops are registered for the device sometime before creation, based
> upon the usecase -- general userspace: s390_iommu_ops (existing), kvm:
> s390_kvm_iommu_domain (new).

Yes, that is the idea. I expect there will be many types of these
special iommu domains. eg Intel has talked about directly using the
KVM CPU page tabe as the IOMMU page table.

> > When the special creation flow is triggered you'd just create one of
> > these with the proper ops already setup. >
> > We are imagining a special ioctl to create these things and each IOMMU
> > HW driver can supply a unique implementation suited to their HW
> > design.
>
> But I haven't connected the dots on this part -- At the end of the day for
> this 'special creation flow' I need the kvm + starting point of the guest
> table + format before we let the new s390_kvm_iommu_domain start doing
> automatic map/unmap during RPCIT intercept -- This initial setup has to come
> from a special ioctl as you say, but where do you see it living? I could
> certainly roll my own via a KVM ioctl or whatever, but it sounds like you're
> also referring to a general-purpose ioctl to encompass each of the different
> unique implementations, with this s390 kvm approach being one.

So, the ioctl will need, as input, a kvm FD and an iommufd FD, and
additional IOMMU driver specific data (format, starting, etc).

The kvm supplies the context for the RPCIT to be captured in

The result is the creation of an iommu_domain inside iommufd, done by
some iommu_ops->alloc_domain_xxxx() driver callback

Which FD has the ioctl it is a bit of an aesthetic choice, but I
predict that iommufd makes the most sense since an object is being
created inside iommfd.

This flow is very similar to the 'userspace page table' flow others
are looking at, but has the extra twist that a KVM FD is needed to supply
the CPU page table.

It may overlap nicely with the intel direction I mentioned. It is just
ugly layering wise that KVM is getting shoved into platform code and
uapis all over the place, but I suppose that is unavoidable. And the
required loose coupling with the kvm module means all kinds of
symbol_gets'etc :(

Jason