Re: [RFC PATCH 2/2] vfio: Include no-iommu mode
From: Alex Williamson
Date: Mon Oct 12 2015 - 14:08:15 EST
On Mon, 2015-10-12 at 11:46 -0600, Alex Williamson wrote:
> On Mon, 2015-10-12 at 19:27 +0300, Michael S. Tsirkin wrote:
> > On Mon, Oct 12, 2015 at 08:56:07AM -0700, Stephen Hemminger wrote:
> > > On Fri, 09 Oct 2015 12:41:10 -0600
> > > Alex Williamson <alex.williamson@xxxxxxxxxx> wrote:
> > >
> > > > There is really no way to safely give a user full access to a PCI
> > > > without an IOMMU to protect the host from errant DMA. There is also
> > > > no way to provide DMA translation, for use cases such as devices
> > > > assignment to virtual machines. However, there are still those users
> > > > that want userspace drivers under those conditions. The UIO driver
> > > > exists for this use case, but does not provide the degree of device
> > > > access and programming that VFIO has. In an effort to avoid code
> > > > duplication, this introduces a No-IOMMU mode for VFIO.
> > > >
> > > > This mode requires enabling CONFIG_VFIO_NOIOMMU and loading the vfio
> > > > module with the option "enable_unsafe_pci_noiommu_mode". This should
> > > > make it very clear that this mode is not safe. In this mode, there is
> > > > no support for unprivileged users, CAP_SYS_ADMIN is required for
> > > > access to the necessary dev files. Mixing no-iommu and secure VFIO is
> > > > also unsupported, as are any VFIO IOMMU backends other than the
> > > > vfio-noiommu backend. Furthermore, unsafe group files are relocated
> > > > to /dev/vfio-noiommu/. Upon successful loading in this mode, the
> > > > kernel is tainted due to the dummy IOMMU put in place. Unloading of
> > > > the module in this mode is also unsupported and will BUG due to the
> > > > lack of support for unregistering an IOMMU for a bus type.
> > > >
> > > > Signed-off-by: Alex Williamson <alex.williamson@xxxxxxxxxx>
> > >
> > > Will this work for distro's where chaning kernel command line options
> > > is really not that practical. We need to boot with one command line
> > > and then decide to use IOMMU (or not) later on during the service
> > > startup of the dataplane application.
> >
> > On open? That's too late in my opinion. But maybe the flag can be
> > tweaked so that it will probe for iommu, if there - do the
> > right thing, but if that fails, enable the dummy one.
> > And maybe defer tainting until device open.
I forgot to address the tainting point; I think we were previously
talking about tainting at the point where bus master is enabled, but I
chose to do it much earlier here because the act of registering a dummy
iommu_ops for a bus type is pretty much the point at which we have a
good chance of breaking the system. I also considered that some devices
can manipulate their config space registers using device specific
registers (such as the example of GPUs that mirror config space in
mmio). It's therefore not always possible to taint at the point where
we think the user has done something bad. The best case would be
tainting at the point where the device file descriptor is opened as you
suggested, but we can't do that while we're exposing dummy iommu_ops to
the whole bus type. Maybe another option would be to create vfio
wrappers for the iommu callbacks to have the iommu facade more local to
vfio.
My main requirements are that I do not want be disruptive to the
existing vfio code or add a significant amount of code that needs to be
maintained for the purpose of supporting a use mode that we don't really
think is supportable. Thanks,
Alex
> The vfio mechanics are that a vfio bus driver, such as vfio-pci binds to
> a device. In the probe function, we check for an iommu group, which
> vfio-core then uses to create the vfio group. So there's nothing to
> open(), the iommu association needs to be made prior to even binding the
> device to vfio-pci. Probing for an iommu can also only be done on a per
> bus_type basis, which will likely eventually become a per bus instance
> to support heterogeneous iommus, so vfio can't simply determine that an
> iommu is not present globally. This is why the new module option
> includes the word "pci", so that it can probe for and attach the dummy
> iommu specifically on the pci_bus_type.
>
> We can still consider if there are better points at which to initiate
> the fake iommu group. Trying to think through vfio-pci doing it on
> probe(), but it seems pretty ugly.
>
> In this RFC, I specifically avoided making the vfio no-iommu iommu
> driver just another modular iommu backend, I wanted it to be tied to a
> vfio module option such that vfio behaves differently with open()s and
> certain ioctls. I think it would be really confusing to users if safe
> and unsafe modes could be used concurrently for different devices.
>
> > Won't address the "old IOMMUs add performance overhead"
> > usecase but I'm not very impressed by that in any case.
>
> Yep, me neither, certainly not for static mappings. There's a lot of
> FUD left over from latencies in the streaming DMA mapping paths where
> mappings are created and destroyed at a high rate. That has more to do
> with flushing mappings out of the hardware than with iotlb miss latency
> or actual translation, which is all that should be in play for most uses
> here.
>
> > > Recent experience is that IOMMU's
> > > are broken on many platforms so the only way to make a DPDK application
> > > it to write a test program that can be used to check if VFIO+IOMMU
> > > works first.
> >
> > In userspace? Well that's just piling up work-arounds. And assuming
> > hardware is broken, who knows what's going on security-wise. These
> > broken systems need to be identified and black-listed in kernel.
> >
> > > Also, although you think the long option will set the bar high
> > > enough it probably will not satisfy anyone. It is annoying enough, that
> > > I would just carry a patch to remove it the silly requirement.
> >
> > That sounds reasonable. Anyone who can carry a kernel patch
> > does not need the warning.
> >
> > > And the the people who believe
> > > all user mode DMA is evil won't be satisfied either.
> > > But I really like having the same consistent API for handling device
> > > access with IOMMU and when IOMMU will/won't work.
> >
> > I agree that's good. Makes it easier to migrate applications to
> > the safe configuration down the road. Thanks Alex!
> >
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/