Re: [RFC] /dev/ioasid uAPI proposal

From: Jason Gunthorpe
Date: Wed Jun 02 2021 - 12:01:47 EST


On Wed, Jun 02, 2021 at 02:20:15AM +0000, Tian, Kevin wrote:
> > From: Alex Williamson <alex.williamson@xxxxxxxxxx>
> > Sent: Wednesday, June 2, 2021 6:22 AM
> >
> > On Tue, 1 Jun 2021 07:01:57 +0000
> > "Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote:
> > >
> > > I summarized five opens here, about:
> > >
> > > 1) Finalizing the name to replace /dev/ioasid;
> > > 2) Whether one device is allowed to bind to multiple IOASID fd's;
> > > 3) Carry device information in invalidation/fault reporting uAPI;
> > > 4) What should/could be specified when allocating an IOASID;
> > > 5) The protocol between vfio group and kvm;
> > >
> > ...
> > >
> > > For 5), I'd expect Alex to chime in. Per my understanding looks the
> > > original purpose of this protocol is not about I/O address space. It's
> > > for KVM to know whether any device is assigned to this VM and then
> > > do something special (e.g. posted interrupt, EPT cache attribute, etc.).
> >
> > Right, the original use case was for KVM to determine whether it needs
> > to emulate invlpg, so it needs to be aware when an assigned device is
>
> invlpg -> wbinvd :)
>
> > present and be able to test if DMA for that device is cache
> > coherent.

Why is this such a strong linkage to VFIO and not just a 'hey kvm
emulate wbinvd' flag from qemu?

I briefly didn't see any obvios linkage in the arch code, just some
dead code:

$ git grep iommu_noncoherent
arch/x86/include/asm/kvm_host.h: bool iommu_noncoherent;
$ git grep iommu_domain arch/x86
arch/x86/include/asm/kvm_host.h: struct iommu_domain *iommu_domain;

Huh?

It kind of looks like the other main point is to generate the
VFIO_GROUP_NOTIFY_SET_KVM which is being used by two VFIO drivers to
connect back to the kvm data

But that seems like it would have been better handled with some IOCTL
on the vfio_device fd to import the KVM to the driver not this
roundabout way?

> > The user, QEMU, creates a KVM "pseudo" device representing the vfio
> > group, providing the file descriptor of that group to show ownership.
> > The ugly symbol_get code is to avoid hard module dependencies, ie. the
> > kvm module should not pull in or require the vfio module, but vfio will
> > be present if attempting to register this device.
>
> so the symbol_get thing is not about the protocol itself. Whatever protocol
> is defined, as long as kvm needs to call vfio or ioasid helper function, we
> need define a proper way to do it. Jason, what's your opinion of alternative
> option since you dislike symbol_get?

The symbol_get was to avoid module dependencies because bringing in
vfio along with kvm is not nice.

The symbol get is not nice here, but unless things can be truely moved
to lower levels of code where module dependencies are not a problem (eg
kvm to iommu is a non-issue) I don't see much of a solution.

Other cases like kvmgt or AP would be similarly fine to have had a
kvmgt to kvm module dependency.

> > All of these use cases are related to the IOMMU, whether DMA is
> > coherent, translating device IOVA to GPA, and an acceleration path to
> > emulate IOMMU programming in kernel... they seem pretty relevant.
>
> One open is whether kvm should hold a device reference or IOASID
> reference. For DMA coherence, it only matters whether assigned
> devices are coherent or not (not for a specific address space). For kvmgt,
> it is for recoding kvm pointer in mdev driver to do write protection. For
> ppc, it does relate to a specific I/O page table.
>
> Then I feel only a part of the protocol will be moved to /dev/ioasid and
> something will still remain between kvm and vfio?

Honestly I would try not to touch this too much :\

Jason