Re: [RFC PATCH 1/3] content: Add VIRTIO_F_SWIOTLB to negotiate use of SWIOTLB bounce buffers
From: David Woodhouse
Date: Fri Apr 04 2025 - 06:15:57 EST
On Fri, 2025-04-04 at 10:27 +0100, David Woodhouse wrote:
> On 4 April 2025 09:32:39 BST, "Michael S. Tsirkin" <mst@xxxxxxxxxx>
> wrote:
> > On Fri, Apr 04, 2025 at 09:16:44AM +0100, David Woodhouse wrote:
> > > On Fri, 2025-04-04 at 04:09 -0400, Michael S. Tsirkin wrote:
> > > > On Fri, Apr 04, 2025 at 08:50:47AM +0100, David Woodhouse
> > > > wrote:
> > > > > What's annoying is that this should work out of the box
> > > > > *already* with
> > > > > virtio-mmio and a `restricted-dma-pool` — for systems which
> > > > > aren't
> > > > > afflicted by UEFI/ACPI/PCI as their discovery mechanisms.
> > > >
> > > >
> > > > That specifically would be just a driver bugfix then?
> > >
> > > I actually think it works out of the box and there isn't even a
> > > bug to
> > > fix. Haven't tested yet.
> > >
> > > The sad part is that the system does it all automatically *if* it
> > > has
> > > CONFIG_DMA_RESTRICTED_POOL (e.g. Linux) and the driver never even
> > > notices that the dma_ops it's using are the swiotlb ops using the
> > > provided buffer.
> > >
> > > Which is *kind* of nice... except that when on a guest OS which
> > > *isn't*
> > > Linux with CONFIG_DMA_RESTRICTED_POOL, the guest will just ignore
> > > the
> > > `restricted-dma-pool` node and try DMA to system memory anyway,
> > > which
> > > will fail.
> >
> > I mean, it's easy to misconfigure Linux, this is why we love it ;)
> > Why
> > is this such a concern?
>
> Because it's incompatible. In the DT world, perhaps this new *non-
> optional* feature/restriction should have come with a new
> "compatible" string such as "virtio-mmio-restricted-dma".
>
> Adding it without backwards compatibility wasn't ideal.
>
> > > That's why my proposal adds the negotiated VIRTIO_F_SWIOTLB
> > > feature, so
> > > that the device side can refuse, if the guest *isn't* agreeing to
> > > use
> > > the bounce buffer in the situations where it must do so.
> >
> >
> > OTOH then setting this feature and if you make the device force it,
> > you are breaking guests restricted-dma-pool which worked
> > previously, no?
>
> Yes. So a platform offering virtio-mmio with restricted DMA, if the
> driver doesn't accept the offered VIRTIO_F_SWIOTLB, may want to
> accept that negotiation anyway, and *hope* that the driver/OS are
> going to use the buffer anyway.
>
> I just didn't want to make that same mistake again when formalising
> and documenting this, and especially when attempting to extend it to
> PCI.
Of course, the beauty of the restricted-dma-pool as supported by DT is
that it's a *system* memory buffer, which is actually OK as long as
it's reserved address space and not just part of normal system memory
that an unsuspecting guest might use for general purposes. So the
trusted part of the hypervisor (e.g. pKVM) can *allow* the VMM access
to that space.
It doesn't *have* to be on-device. That just seemed like the more
natural way to do it for PCI.
I suppose we *could* allow for the virtio-pci transport to do it the
same way as virtio-mmio though. The VIRTIO_PCI_CAP_SWIOTLB capability¹
could reference a range of system memory space, just like the
`restricted-dma-pool` property does.
It's a weird abstraction especially for a physical PCI device to do
that because the system memory space is outside its ownership. But in a
physical device it could be writable, and you could consider it the
responsibility of the system firmware to configure it appropriately, in
accordance with the IOMMU and other DMA restrictions of the platform.
That does solve it for the CoCo case without addressing the P2P staging
case that Christoph mentions, though.
¹ I will rename it, Christoph, if it survives at all. Probably
VIRTIO_F_RESTRICTED_DMA and VIRTIO_PCI_CAP_RESTRICTED_DMA but of course
it depends on the semantics we conclude it should have.
Attachment:
smime.p7s
Description: S/MIME cryptographic signature