Re: [RFC] virtio: Use DMA MAP API for devices without an IOMMU

From: Ram Pai
Date: Tue May 01 2018 - 12:34:24 EST


On Wed, Apr 18, 2018 at 07:20:10PM +0300, Michael S. Tsirkin wrote:
> On Wed, Apr 18, 2018 at 08:47:10AM +0530, Anshuman Khandual wrote:
> > On 04/15/2018 05:41 PM, Christoph Hellwig wrote:
> > > On Fri, Apr 06, 2018 at 06:37:18PM +1000, Benjamin Herrenschmidt wrote:
> > >>>> implemented as DMA API which the virtio core understands. There is no
> > >>>> need for an IOMMU to be involved for the device representation in this
> > >>>> case IMHO.
> > >>>
> > >>> This whole virtio translation issue is a mess. I think we need to
> > >>> switch it to the dma API, and then quirk the legacy case to always
> > >>> use the direct mapping inside the dma API.
> > >>
> > >> Fine with using a dma API always on the Linux side, but we do want to
> > >> special case virtio still at the arch and qemu side to have a "direct
> > >> mapping" mode. Not sure how (special flags on PCI devices) to avoid
> > >> actually going through an emulated IOMMU on the qemu side, because that
> > >> slows things down, esp. with vhost.
> > >>
> > >> IE, we can't I think just treat it the same as a physical device.
> > >
> > > We should have treated it like a physical device from the start, but
> > > that device has unfortunately sailed.
> > >
> > > But yes, we'll need a per-device quirk that says 'don't attach an
> > > iommu'.
> >
> > How about doing it per platform basis as suggested in this RFC through
> > an arch specific callback. Because all the virtio devices in the given
> > platform would require and exercise this option (to avail bounce buffer
> > mechanism for secure guests as an example). So the flag basically is a
> > platform specific one not a device specific one.
>
> That's not the case. A single platform can have a mix of virtio and
> non-virtio devices. Same applies even within virtio, e.g. the balloon
> device always bypasses an iommu. Further, QEMU supports out of process
> devices some of which might bypass the IOMMU.

Given that each virtio device has to behave differently depending on
(a) what it does? (balloon, block, net etc )
(b) what platform it is on? (pseries, x86, ....)
(c) what environment it is on? (secure, insecure...)

I think, we should let the virtio device decide what it wants, instead
of forcing it to NOT use dma_ops when VIRTIO_F_IOMMU_PLATFORM is NOT
enabled.

Currently, virtio generic code, has an assumption that a device must NOT
use dma operations if the hypervisor has NOT enabled VIRTIO_F_IOMMU_PLATFORM.
This assumption is baked into vring_use_dma_api(); though there is a
special exception for xen_domain().

This assumption is restricting us from using the dma_ops abstraction for
virtio devices on secure VM. BTW: VIRTIO_F_IOMMU_PLATFORM may or may not
be set on this platform.

On our secure VM, virtio devices; by default, do not share pages with
hypervisor. In other words, hypervisor cannot access the secure VM
pages. The secure VM with the help of the hardware enables some pages
to be shared with the hypervisor. Secure VM then uses these pages to
bounce virtio data with the hypervisor.

One elegant way to impliment this functionality is to abstract it
under our special dma_ops and wire it to the virtio devices.

However the restriction imposed by the generic virtio code, contrains us
from doing so.

If we can enrich vring_use_dma_api() to take multiple factors into
consideration and not just VIRTIO_F_IOMMU_PLATFORM; perferrably by
consulting a arch-dependent function, we could seemlessly integrate
into the existing virtio infrastructure.

RP
>
> --
> MST

--
Ram Pai