Re: [RFC] /dev/ioasid uAPI proposal

From: David Gibson
Date: Mon Jun 07 2021 - 23:01:23 EST


On Tue, Jun 01, 2021 at 04:22:25PM -0600, Alex Williamson wrote:
> On Tue, 1 Jun 2021 07:01:57 +0000
> "Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote:
> >
> > I summarized five opens here, about:
> >
> > 1) Finalizing the name to replace /dev/ioasid;
> > 2) Whether one device is allowed to bind to multiple IOASID fd's;
> > 3) Carry device information in invalidation/fault reporting uAPI;
> > 4) What should/could be specified when allocating an IOASID;
> > 5) The protocol between vfio group and kvm;
> >
> ...
> >
> > For 5), I'd expect Alex to chime in. Per my understanding looks the
> > original purpose of this protocol is not about I/O address space. It's
> > for KVM to know whether any device is assigned to this VM and then
> > do something special (e.g. posted interrupt, EPT cache attribute, etc.).
>
> Right, the original use case was for KVM to determine whether it needs
> to emulate invlpg, so it needs to be aware when an assigned device is
> present and be able to test if DMA for that device is cache coherent.
> The user, QEMU, creates a KVM "pseudo" device representing the vfio
> group, providing the file descriptor of that group to show ownership.
> The ugly symbol_get code is to avoid hard module dependencies, ie. the
> kvm module should not pull in or require the vfio module, but vfio will
> be present if attempting to register this device.
>
> With kvmgt, the interface also became a way to register the kvm pointer
> with vfio for the translation mentioned elsewhere in this thread.
>
> The PPC/SPAPR support allows KVM to associate a vfio group to an IOMMU
> page table so that it can handle iotlb programming from pre-registered
> memory without trapping out to userspace.

To clarify that's a guest side logical vIOMMU page table which is
partially managed by KVM. This is an optimization - things can work
without it, but it means guest iomap/unmap becomes a hot path because
each map/unmap hypercall has to go
guest -> KVM -> qemu -> VFIO

So there are multiple context transitions.

> > Because KVM deduces some policy based on the fact of assigned device,
> > it needs to hold a reference to related vfio group. this part is irrelevant
> > to this RFC.
>
> All of these use cases are related to the IOMMU, whether DMA is
> coherent, translating device IOVA to GPA, and an acceleration path to
> emulate IOMMU programming in kernel... they seem pretty relevant.
>
> > But ARM's VMID usage is related to I/O address space thus needs some
> > consideration. Another strange thing is about PPC. Looks it also leverages
> > this protocol to do iommu group attach: kvm_spapr_tce_attach_iommu_
> > group. I don't know why it's done through KVM instead of VFIO uAPI in
> > the first place.
>
> AIUI, IOMMU programming on PPC is done through hypercalls, so KVM needs
> to know how to handle those for in-kernel acceleration. Thanks,

For PAPR guests, which is the common case, yes. Bare metal POWER
hosts have their own page table format. And probably some of the
newer embedded ppc models have some different IOMMU model entirely,
but I'm not familiar with it.

--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson

Attachment: signature.asc
Description: PGP signature