Re: [RFC 0/4] Virtio uses DMA API for all devices

From: Michael S. Tsirkin
Date: Thu Aug 02 2018 - 17:51:45 EST

On Thu, Aug 02, 2018 at 04:13:09PM -0500, Benjamin Herrenschmidt wrote:
> On Thu, 2018-08-02 at 23:52 +0300, Michael S. Tsirkin wrote:
> > > Yes, this is the purpose of Anshuman original patch (I haven't looked
> > > at the details of the patch in a while but that's what I told him to
> > > implement ;-) :
> > >
> > > - Make virtio always use DMA ops to simplify the code path (with a set
> > > of "transparent" ops for legacy)
> > >
> > > and
> > >
> > > - Provide an arch hook allowing us to "override" those "transparent"
> > > DMA ops with some custom ones that do the appropriate swiotlb gunk.
> > >
> > > Cheers,
> > > Ben.
> > >
> >
> > Right but as I tried to say doing that brings us to a bunch of issues
> > with using DMA APIs in virtio. Put simply DMA APIs weren't designed for
> > guest to hypervisor communication.
> I'm not sure I see the problem, see below
> > When we do (as is the case with PLATFORM_IOMMU right now) this adds a
> > bunch of overhead which we need to get rid of if we are to switch to
> > PLATFORM_IOMMU by default. We need to fix that.
> So let's differenciate the two problems of having an IOMMU (real or
> emulated) which indeeds adds overhead etc... and using the DMA API.

Well actually it's the other way around. An iommu in theory doesn't need
to bring overhead if you set it in bypass mode. Which does imply the
iommu supports bypass mode. Is that universally the case? DMA API does
see Christoph's list of things it does some of which add overhead.

> At the moment, virtio does this all over the place:
> if (use_dma_api)
> dma_map/alloc_something(...)
> else
> use_pa
> The idea of the patch set is to do two, somewhat orthogonal, changes
> that together achieve what we want. Let me know where you think there
> is "a bunch of issues" because I'm missing it:
> 1- Replace the above if/else constructs with just calling the DMA API,
> and have virtio, at initialization, hookup its own dma_ops that just
> "return pa" (roughly) when the IOMMU stuff isn't used.
> This adds an indirect function call to the path that previously didn't
> have one (the else case above). Is that a significant/measurable
> overhead ?

Seems to be :( Jason reports about 4%. I wonder whether we can support
map_sg and friends being NULL, then use that when mapping is an
identity. A conditional branch there is likely very cheap.

Would this cover all platforms with kvm (which is where we care
most about performance)?

> This change stands alone, and imho "cleans" up virtio by avoiding all
> that if/else "2 path" and unless it adds a measurable overhead, should
> probably be done.
> 2- Make virtio use the DMA API with our custom platform-provided
> swiotlb callbacks when needed, that is when not using IOMMU *and*
> running on a secure VM in our case.
> This benefits from -1- by making us just plumb in a different set of
> DMA ops we would have cooked up specifically for virtio in our arch
> code (or in virtio itself but build arch-conditionally in a separate
> file). But it doesn't strictly need it -1-:
> Now, -2- doesn't strictly needs -1-. We could have just done another
> xen-like hack that forces the DMA API "ON" for virtio when running in a
> secure VM.
> The problem if we do that however is that we also then need the arch
> PCI code to make sure it hooks up the virtio PCI devices with the
> special "magic" DMA ops that avoid the iommu but still do swiotlb, ie,
> not the same as other PCI devices. So it will have to play games such
> as checking vendor/device IDs for virtio, checking the IOMMU flag,
> etc... from the arch code which really bloody sucks when assigning PCI
> DMA ops.
> However, if we do it the way we plan here, on top of -1-, with a hook
> called from virtio into the arch to "override" the virtio DMA ops, then
> we avoid the problem completely: The arch hook would only be called by
> virtio if the IOMMU flag is *not* set. IE only when using that special
> "hypervisor" iommu bypass. If the IOMMU flag is set, virtio uses normal
> PCI dma ops as usual.
> That way, we have a very clear semantic: This hook is purely about
> replacing those "null" DMA ops that just return PA introduced in -1-
> with some arch provided specially cooked up DMA ops for non-IOMMU
> virtio that know about the arch special requirements. For us bounce
> buffering.
> Is there something I'm missing ?
> Cheers,
> Ben.

Right so I was trying to write it up in a systematic way, but just to
give you one example, if there is a system where DMA API handles
coherency issues, or flushing of some buffers, then our PLATFORM_IOMMU
flag causes that to happen.

And we kinda worked around this without the IOMMU by basically saying
"ok we do not really need DMA API so let's just bypass it" and
it was kind of ok except now everyone is switching to vIOMMU
just in case. So now people do want some parts of what DMA API does,
such as the bounce buffer use, or IOMMU mappings.

And maybe in the end the solution is going to be to do something similar
to virt_Xmb except for DMA APIs: add APIs that handle just the
addressing bits but without the overhead. See
commit 6a65d26385bf487926a0616650927303058551e3
asm-generic: implement virt_xxx memory barriers
for reference, it's a similar set of issues.

So it's not a problem with your patches as such, it's just that
they don't solve that harder problem.