Re: [RFC PATCH 0/2] VFIO no-iommu

From: Alex Williamson
Date: Sun Oct 11 2015 - 15:25:47 EST


On Sun, 2015-10-11 at 21:29 +0300, Michael S. Tsirkin wrote:
> On Sun, Oct 11, 2015 at 09:28:09PM +0300, Michael S. Tsirkin wrote:
> > On Fri, Oct 09, 2015 at 12:40:56PM -0600, Alex Williamson wrote:
> > > Recent patches for UIO have been attempting to add MSI/X support,
> > > which unfortunately implies DMA support, which users have been
> > > enabling anyway, but was never intended for UIO. VFIO on the other
> > > hand expects an IOMMU to provide isolation of devices, but provides
> > > a much more complete device interface, which already supports full
> > > MSI/X support. There's really no way to support userspace drivers
> > > with DMA capable devices without an IOMMU to protect the host, but
> > > we can at least think about doing it in a way that properly taints
> > > the kernel and avoids creating new code duplicating existing code,
> > > that does have a supportable use case.
> > >
> > > The diffstat is only so large because I moved vfio.c to vfio_core.c
> > > so I could more easily keep the module named vfio.ko while keeping
> > > the bulk of the no-iommu support in a separate file that can be
> > > optionally compiled. We're really looking at a couple hundred lines
> > > of mostly stub code. The VFIO_NOIOMMU_IOMMU could certainly be
> > > expanded to do page pinning and virt_to_bus() translation, but I
> > > didn't want to complicate anything yet.
> >
> > I think it's already useful like this, since all current users
> > seem happy enough to just use hugetlbfs to do pinning, and
> > ignore translation.

That was sort of my thought too...

> > > I've only compiled this and tested loading the module with the new
> > > no-iommu mode enabled, I haven't actually tried to port a DPDK
> > > driver to it, though it ought to be a pretty obvious mix of the
> > > existing UIO and VFIO versions (set the IOMMU, but avoid using it
> > > for mapping, use however bus translations are done w/ UIO). The core
> > > vfio device file is still /dev/vfio/vfio, but all the groups become
> > > /dev/vfio-noiommu/$GROUP.
> > >
> > > It should be obvious, but I always feel obligated to state that this
> > > does not and will not ever enable device assignment to virtual
> > > machines on non-IOMMU capable platforms.
> >
> > In theory, it's kind of possible using paravirtualization.
> >
> > Within guest, you'd make map_page retrieve the io address from the host
> > and return that as dma_addr_t. The only question would be APIs that
> > require more than one contigious page in IO space (e.g. I think alloc
> > coherent is like this?).
> > Not a problem if host is using hugetlbfs, but if not, I guess we could
> > add a hypercall and some Linux API on the host to trigger compaction
> > on the host aggressively. MADV_CONTIGIOUS?
>
> Not that I see a good reason for that.
> Just use an iommu.

Right, I think it boils down to how much code are we willing to maintain
for an interface that we consider so dangerous and unsupportable that we
immediately taint the kernel. This is partially why I stopped short of
expanding the no-iommu interface to do page pinning or virt-to-bus
translation. A few hundred lines of boilerplate stubs to enable re-use
of code that is maintained for supportable interfaces is one thing.
Building onto that with paravirtual IOMMU interfaces to a VM for
something that ultimately cannot be supported is not something I want to
participate in. Get an IOMMU. Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/