Re: [RFC PATCH 1/3] content: Add VIRTIO_F_SWIOTLB to negotiate use of SWIOTLB bounce buffers
From: David Woodhouse
Date: Fri Apr 04 2025 - 03:51:07 EST
On Thu, 2025-04-03 at 23:35 -0700, Christoph Hellwig wrote:
> As stated above I suspect you still are asking the wrong question and
> have the wrong mental model. Virtio fundamentally can do DMA, you just
> don't want it to. And based on the other subthread I also suspect you'd
> actually be much better off with your bounce buffer in the virtual host
> memory instead of coming up with this particularly odd case, but even
> if not it's the system description that matters here, not the device
> model.
I do agree, this is fundamentally a system issue. In a CoCo model, it's
non-trivial for the system to allow *virtual* devices to do "DMA"
because that actually means allowing the VMM to access arbitrary guest
memory.
And most hardware designers screwed up the 2-stage IOMMU which allows
protection from *real* devices, and didn't build in a way for the VMM
to access guest memory through a stage1 mapping.
Fixing that to allow system DMA to those guest devices in a way which
will work across operating systems is impractical, as it involves the
guest knowing which PCI device is implemented where, and either core
kernel (share/unshare) enlightenments, or emulating a full IOMMU in the
*trusted* part of the hypervisor (e.g. pKVM).
So "for the emulated devices, just use a device model that doesn't do
arbitrary DMA to system memory" is a nice simple answer, and keeps the
guest support restricted to its *own* standalone driver.
And yes, virtio is kind of fundamentally DMA-based as you said, but
that's OK if the DMA is only to a specific range which is *known* to be
shareable and which the guest would never consider to be 'normal RAM'.
Because then the VMM can be *allowed* to access that.
What's annoying is that this should work out of the box *already* with
virtio-mmio and a `restricted-dma-pool` — for systems which aren't
afflicted by UEFI/ACPI/PCI as their discovery mechanisms.
In fact, I may even try offering virtio-mmio in a PRP0001 ACPI device
to see how that works, although I think I'll lament the lack of MSI
interrupts.
But the cleaner solution seemed (at least when I started this
discussion) to be to add the same kind of facility to the virtio-pci
transport too. And I still don't think I've quite given up on it;
especially as you point out the use case for staged P2P DMA too.
I'll go take a closer look at how we can cleanly disambiguate between
address spaces.
Attachment:
smime.p7s
Description: S/MIME cryptographic signature