Re: [PATCH 1/2] mm: move force_dma_unencrypted() to mem_encrypt.h

From: Cornelia Huck
Date: Tue Feb 25 2020 - 13:08:37 EST


On Mon, 24 Feb 2020 19:49:53 +0100
Halil Pasic <pasic@xxxxxxxxxxxxx> wrote:

> On Mon, 24 Feb 2020 14:33:14 +1100
> David Gibson <david@xxxxxxxxxxxxxxxxxxxxx> wrote:
>
> > On Fri, Feb 21, 2020 at 07:07:02PM +0100, Halil Pasic wrote:
> > > On Fri, 21 Feb 2020 10:48:15 -0500
> > > "Michael S. Tsirkin" <mst@xxxxxxxxxx> wrote:
> > >
> > > > On Fri, Feb 21, 2020 at 02:06:39PM +0100, Halil Pasic wrote:
> > > > > On Fri, 21 Feb 2020 14:27:27 +1100
> > > > > David Gibson <david@xxxxxxxxxxxxxxxxxxxxx> wrote:
> > > > >
> > > > > > On Thu, Feb 20, 2020 at 05:31:35PM +0100, Christoph Hellwig wrote:
> > > > > > > On Thu, Feb 20, 2020 at 05:23:20PM +0100, Christian Borntraeger wrote:
> > > > > > > > >From a users perspective it makes absolutely perfect sense to use the
> > > > > > > > bounce buffers when they are NEEDED.
> > > > > > > > Forcing the user to specify iommu_platform just because you need bounce buffers
> > > > > > > > really feels wrong. And obviously we have a severe performance issue
> > > > > > > > because of the indirections.
> > > > > > >
> > > > > > > The point is that the user should not have to specify iommu_platform.
> > > > > > > We need to make sure any new hypervisor (especially one that might require
> > > > > > > bounce buffering) always sets it,
> > > > > >
> > > > > > So, I have draft qemu patches which enable iommu_platform by default.
> > > > > > But that's really because of other problems with !iommu_platform, not
> > > > > > anything to do with bounce buffering or secure VMs.
> > > > > >
> > > > > > The thing is that the hypervisor *doesn't* require bounce buffering.
> > > > > > In the POWER (and maybe s390 as well) models for Secure VMs, it's the
> > > > > > *guest*'s choice to enter secure mode, so the hypervisor has no reason
> > > > > > to know whether the guest needs bounce buffering. As far as the
> > > > > > hypervisor and qemu are concerned that's a guest internal detail, it
> > > > > > just expects to get addresses it can access whether those are GPAs
> > > > > > (iommu_platform=off) or IOVAs (iommu_platform=on).
> > > > >
> > > > > I very much agree!
> > > > >
> > > > > >
> > > > > > > as was a rather bogus legacy hack
> > > > > >
> > > > > > It was certainly a bad idea, but it was a bad idea that went into a
> > > > > > public spec and has been widely deployed for many years. We can't
> > > > > > just pretend it didn't happen and move on.
> > > > > >
> > > > > > Turning iommu_platform=on by default breaks old guests, some of which
> > > > > > we still care about. We can't (automatically) do it only for guests
> > > > > > that need bounce buffering, because the hypervisor doesn't know that
> > > > > > ahead of time.

We could default to iommu_platform=on on s390 when the host has active
support for protected virtualization... but that's just another kind of
horrible, so let's just pretend I didn't suggest it.

> > > > >
> > > > > Turning iommu_platform=on for virtio-ccw makes no sense whatsover,
> > > > > because for CCW I/O there is no such thing as IOMMU and the addresses
> > > > > are always physical addresses.
> > > >
> > > > Fix the name then. The spec calls is ACCESS_PLATFORM now, which
> > > > makes much more sense.
> > >
> > > I don't quite get it. Sorry. Maybe I will revisit this later.
> >
> > Halil, I think I can clarify this.
> >
> > The "iommu_platform" flag doesn't necessarily have anything to do with
> > an iommu, although it often will. Basically it means "access guest
> > memory via the bus's normal DMA mechanism" rather than "access guest
> > memory using GPA, because you're the hypervisor and you can do that".
> >
>
> Unfortunately, I don't think this is what is conveyed to the end users.
> Let's see what do we have documented:
>
> Neither Qemu user documentation
> (https://www.qemu.org/docs/master/qemu-doc.html) nor online help:
> qemu-system-s390x -device virtio-net-ccw,?|grep iommu
> iommu_platform=<bool> - on/off (default: false)
> has any documentation on it.

Now, that's 'helpful' -- this certainly calls out for a bit of doc...

>
> But libvirt does have have documenttion on the knob that contros
> iommu_platform for QEMU (when QEMU is managed by libvirt):
> """
> Virtio-related options
>
> QEMU's virtio devices have some attributes related to the virtio
> transport under the driver element: The iommu attribute enables the use
> of emulated IOMMU by the device. The attribute ats controls the Address
> Translation Service support for PCIe devices. This is needed to make use
> of IOTLB support (see IOMMU device). Possible values are on or off.
> Since 3.5.0
> """
> (https://libvirt.org/formatdomain.html#elementsVirtio)
>
> Thus it seems the only available documentation says that it "enables the use
> of emulated IOMMU by the device".
>
> And for vhost-user we have
> """
> When the ``VIRTIO_F_IOMMU_PLATFORM`` feature has not been negotiated:
>
> * Guest addresses map to the vhost memory region containing that guest
> address.
>
> When the ``VIRTIO_F_IOMMU_PLATFORM`` feature has been negotiated:
>
> * Guest addresses are also called I/O virtual addresses (IOVAs). They are
> translated to user addresses via the IOTLB.
> """
> (docs/interop/vhost-user.rst)
>
> > For the case of ccw, both mechanisms end up being the same thing,
> > since CCW's normal DMA *is* untranslated GPA access.
> >
>
> Nod.
>
> > For this reason, the flag in the spec was renamed to ACCESS_PLATFORM,
> > but the flag in qemu still has the old name.
> >
>
> Yes, the name in the spec is more neutral.
>
> > AIUI, Michael is saying you could trivially change the name in qemu
> > (obviously you'd need to alias the old name to the new one for
> > compatibility).
> >
>
> I could, and the I could also ask the libvirt guys to change <driver
> iommu='X'> to <driver access_platform='X'> or similar and to change
> their documentation to something that is harder to comprehend. Although
> I'm not sure they would like the idea.

Hopefully, the documentation can be changed to something that is _not_
harder to comprehend :) (with a bit more text, I suppose.) Renaming to
something like access_platform seems like a good idea, even with the
required compat dance.

>
> >
> > Actually, the fact that ccw has no translation makes things easier for
> > you: you don't really have any impediment to turning ACCESS_PLATFORM
> > on by default, since it doesn't make any real change to how things
> > work.
>
> Yeah, it should not, in theory, but currently it does in practice.
> Currently vhost will try to do the IOTLB dance (Jason has a patch that
> should help with that), and we get the 'use dma api' side effects in the
> guest (e.g. virtqueue's data go <2G + some overhead).

Nod.

>
> >
> > The remaining difficulty is that the virtio driver - since it can sit
> > on multiple buses - won't know this, and will reject the
> > ACCESS_PLATFORM flag, even though it could just do what it normally
> > does on ccw and it would work.
>
> Right ACCESS_PLATFORM is a funny feature where the device refuses to
> work if the driver does not ack.
>
> >
> > For that case, we could consider a hack in qemu where for virtio-ccw
> > devices *only* we allow the guest to nack the ACCESS_PLATFORM flag and
> > carry on anyway. Normally we insist that the guest accept the
> > ACCESS_PLATFORM flag if offered, because on most platforms they
> > *don't* amount to the same thing.
>
> Jason found a nice way to differentiate between needs translation and
> does not need translation. But that patch still requires the ack by the
> driver (and as Michael has pointed out we have to consider migration).
>
> I'm afraid that F_IOMMU_PLATFORM means different things in different
> contexts, and that this ain't sufficiently documented. I'm tempted to do
> a proper write-up on this (let's hope my motivation will and my time
> will allow). I would also very much like to have Conny's opinion on this.

More documentation is never a bad idea; but I'm afraid I don't have any
further insights at the moment.

Attachment: pgpn60BMX3j9E.pgp
Description: OpenPGP digital signature