Re: [RFC] Simple device assignment with VFIO platform
From: Mostafa Saleh
Date: Tue Oct 01 2024 - 06:15:19 EST
Hi Alex,
On Mon, Sep 30, 2024 at 11:10:13AM -0600, Alex Williamson wrote:
> On Fri, 27 Sep 2024 17:17:02 +0100
> Mostafa Saleh <smostafa@xxxxxxxxxx> wrote:
>
> > Hi All,
> >
> > Background
> > ==========
> > I have been looking into assigning simple devices which are not DMA
> > capable to VMs on Android using VFIO platform.
> >
> > I have been mainly looking with respect to Protected KVM (pKVM), which
> > would need some extra modifications mostly to KVM-VFIO, that is quite
> > early under prototyping at the moment, which have core pending pKVM
> > dependencies upstream as guest memfd[1] and IOMMUs support[2].
> >
> > However, this problem is not pKVM(or KVM) specific, and about the
> > design of VFIO.
> >
> > [1] https://lore.kernel.org/kvm/20240801090117.3841080-1-tabba@xxxxxxxxxx/
> > [2] https://lore.kernel.org/kvmarm/20230201125328.2186498-1-jean-philippe@xxxxxxxxxx/
> >
> > Problem
> > =======
> > At the moment, VFIO platform will deny a device from probing (through
> > vfio_group_find_or_alloc()), if it’s not part of an IOMMU group,
> > unless (CONFIG_VFIO_NOIOMMU is configured)
> >
> > As far as I understand the current solutions to pass through platform
> > devices that are not DMA capable are:
> > - Use VFIO platform + (CONFIG_VFIO_NOIOMMU): The problem with that, it
> > taints the kernel and this doesn’t actually fit the device description
> > as the device doesn’t only have an IOMMU, but it’s not DMA capable at
> > all, so the kernel should be safe with assigning the device without
> > DMA isolation.
>
> If the device is not capable of DMA, then what do you get from using
> vfio? Essentially the device is reduced to some MMIO ranges and
> something to configure line level interrupt notification.
> Traditionally this is the realm of UIO.
My simplistic understanding was that VFIO mainly deals with device passthrough
to VMs while UIO deals with userspace drivers.
Also, it seems that UIO lacks support of eventfd for irqs, which makes it
inefficient to use with KVM.
>
> > - Use VFIO mdev with an emulated IOMMU, this seems it could work. But
> > many of the code would be duplicate with the VFIO platform code as the
> > device is a platform device.
>
> Per Eric's talk recently at KVM Forum[1] we're already at an inflection
> point for vfio-platform. We're suffering from lack of contributions
> for any current devices, agreement in the community to end it as a
> failed experiment, while at the same time vendors quietly indicate they
> depend on it. It seems that at a minimum, we can't support
> vfio-platform like we do vfio-pci, where a meta driver pretends it can
> support exposing any platform device. There's not enough definition to
> a platform device. Therefore if vfio-platform is to survive, it's
> probably going to need to do so through device specific drivers which
> understands how a specific device operates, and potentially whether it
> can or cannot perform DMA. That might mean that vfio-platform needs to
> take the mdev or vfio-pci variant driver approach, and the code
> duplication you're concerned about should instead be refactoring in
> order to re-use the existing code from more device specific drivers.
I see, that makes sense, but I guess that won't progress much without
vendor contributing drivers :/
> > - Use UIO: Can map MMIO to userspace which seems to be focused for
> > userspace drivers rather than VM passthrough and I can’t find its
> > support in Qemu.
>
> This would need to be device specific code on the QEMU side, so there's
> probably not much to share here.
>
> > One other benefit from supporting this in VFIO platform, that we can
> > use the existing UAPI for platform devices (and support in VMMs)
>
> But it's not like there's ubiquitous support for vfio-platform devices
> in QEMU either. Each platform device needs hooks to at least setup
> device tree entries to describe the device to the VM. AIUI, QEMU needs
> to understand the device and how to describe it to the VM whether the
> approach is vfio-platform or UIO.
Makes sense, although it's sad there is no upstream support for any
device.
>
> > Proposal
> > ========
> > Extend VFIO platform to allow assigning devices without an IOMMU, this
> > can be possibly done by
> > - Checking device capability from the platform bus (would be something
> > ACPI/OF specific similar to how it configures DMA from
> > platform_dma_configure(), we can add a new function something like
> > platfrom_dma_capable())
> >
> > - Using emulated IOMMU for such devices
> > (vfio_register_emulated_iommu_dev()), instead of having intrusive
> > changes about IOMMUs existence.
> >
> > If that makes sense I can work on RFC(I don’t have any code at the moment)
>
> As noted in the thread referenced by Eric, I don't think we want to add
> any sort of vfio no-iommu into QEMU. vfio-platform in particular is in
> no position drive such a feature. If you want to use vfio for this,
> the most viable approach would seem to be one of using an emulated
> IOMMU in a device specific context which can understand the device is
> not capable of DMA. We likely need to let vfio-platform die as generic
> means to expose arbitrary platform devices. Thanks,
I see, thanks a lot for the feedback, it seems vfio-platform is not the
right place for this, and it'd be better to have separate drivers that
register vfio devices, and maybe we can have more common code for platform
devices as this progresses.
Thanks,
Mostafa
>
> Alex
>
> [1]https://www.youtube.com/watch?v=Q5BOSbtwRr8
>