Re: [RFC PATCH 1/3] content: Add VIRTIO_F_SWIOTLB to negotiate use of SWIOTLB bounce buffers

From: Michael S. Tsirkin
Date: Thu Apr 03 2025 - 09:19:29 EST


On Thu, Apr 03, 2025 at 09:22:57AM +0100, David Woodhouse wrote:
> On Thu, 2025-04-03 at 04:13 -0400, Michael S. Tsirkin wrote:
> > On Thu, Apr 03, 2025 at 08:54:45AM +0100, David Woodhouse wrote:
> > > On Thu, 2025-04-03 at 03:34 -0400, Michael S. Tsirkin wrote:
> > > >
> > > > Indeed I personally do not exactly get why implement a virtual system
> > > > without an IOMMU when virtio-iommu is available.
> > > >
> > > > I have a feeling it's about lack of windows drivers for virtio-iommu
> > > > at this point.
> > >
> > > And a pKVM (etc.) implementation of virtio-iommu which would allow the
> > > *trusted* part of the hypervisor to know which guest memory should be
> > > shared with the VMM implementing the virtio device models?
> >
> > Is there a blocker here?
>
> Only the amount of complexity in what should be a minimal Trusted
> Compute Base. (And ideally subject to formal methods of proving its
> correctness too.)

Shrug. Does not have to be complex. Could be a "simple mode" for
virtio-iommu where it just accepts one buffer. No?

> And frankly, if we were going to accept a virtio-iommu in the TCB why
> not just implement enough virtqueue knowledge to build something where
> the trusted part just snoops on the *actual* e.g. virtio-net device to
> know which buffers the VMM was *invited* to access, and facilitate
> that?

Because it's awful? Buffers are a datapath thing. Stay away from there.

> We looked at doing that. It's awful.

Indeed.

> > > You'd also end up in a situation where you have a virtio-iommu for some
> > > devices, and a real two-stage IOMMU (e.g. SMMU or AMD's vIOMMU) for
> > > other devices. Are guest operating systems going to cope well with
> > > that?
> >
> > They should. In particular because systems with multiple IOMMUs already
> > exist.
> >
> > > Do the available discovery mechanisms for all the relevant IOMMUs
> > > even *allow* for that to be expressed?
> >
> > I think yes. But, it's been a while since I played with this, let me
> > check what works, what does not, and get back to you on this.
>
> Even if it could work in theory, I'll be astonished if it actually
> works in practice across a wide set of operating systems, and if it
> *ever* works for Windows.

Well it used to work. I won't have time to play with it until sometime
next week, if it's relevant. If I poke at my windows system, I see

> Compared with the simple option of presenting a device which
> conceptually doesn't even *do* DMA, which is confined to its own
> modular device driver...

I'm not (yet) nacking this hack, though I already heartily dislike the
fact that it is mostly a PV-only thing since it can not be offloaded to
a real device efficiently *and* requires copies to move data
between devices. But, let's see if more issues surface.


--
MST