Re: [RFC] /dev/ioasid uAPI proposal

From: Alex Williamson
Date: Wed Jun 02 2021 - 13:11:26 EST


On Wed, 2 Jun 2021 13:01:40 -0300
Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:

> On Wed, Jun 02, 2021 at 02:20:15AM +0000, Tian, Kevin wrote:
> > > From: Alex Williamson <alex.williamson@xxxxxxxxxx>
> > > Sent: Wednesday, June 2, 2021 6:22 AM
> > >
> > > On Tue, 1 Jun 2021 07:01:57 +0000
> > > "Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote:
> > > >
> > > > I summarized five opens here, about:
> > > >
> > > > 1) Finalizing the name to replace /dev/ioasid;
> > > > 2) Whether one device is allowed to bind to multiple IOASID fd's;
> > > > 3) Carry device information in invalidation/fault reporting uAPI;
> > > > 4) What should/could be specified when allocating an IOASID;
> > > > 5) The protocol between vfio group and kvm;
> > > >
> > > ...
> > > >
> > > > For 5), I'd expect Alex to chime in. Per my understanding looks the
> > > > original purpose of this protocol is not about I/O address space. It's
> > > > for KVM to know whether any device is assigned to this VM and then
> > > > do something special (e.g. posted interrupt, EPT cache attribute, etc.).
> > >
> > > Right, the original use case was for KVM to determine whether it needs
> > > to emulate invlpg, so it needs to be aware when an assigned device is
> >
> > invlpg -> wbinvd :)

Oops, of course.

> > > present and be able to test if DMA for that device is cache
> > > coherent.
>
> Why is this such a strong linkage to VFIO and not just a 'hey kvm
> emulate wbinvd' flag from qemu?

IIRC, wbinvd has host implications, a malicious user could tell KVM to
emulate wbinvd then run the op in a loop and induce a disproportionate
load on the system. We therefore wanted a way that it would only be
enabled when required.

> I briefly didn't see any obvios linkage in the arch code, just some
> dead code:
>
> $ git grep iommu_noncoherent
> arch/x86/include/asm/kvm_host.h: bool iommu_noncoherent;
> $ git grep iommu_domain arch/x86
> arch/x86/include/asm/kvm_host.h: struct iommu_domain *iommu_domain;
>
> Huh?

Cruft from legacy KVM device assignment, I assume. What you're looking
for is:

kvm_vfio_update_coherency
kvm_arch_register_noncoherent_dma
atomic_inc(&kvm->arch.noncoherent_dma_count);

need_emulate_wbinvd
kvm_arch_has_noncoherent_dma
atomic_read(&kvm->arch.noncoherent_dma_count);

There are a couple other callers that I'm not as familiar with.

> It kind of looks like the other main point is to generate the
> VFIO_GROUP_NOTIFY_SET_KVM which is being used by two VFIO drivers to
> connect back to the kvm data
>
> But that seems like it would have been better handled with some IOCTL
> on the vfio_device fd to import the KVM to the driver not this
> roundabout way?

Then QEMU would need to know which drivers require KVM knowledge? This
allowed transparent backwards compatibility with userspace. Thanks,

Alex