Re: [RFC] /dev/ioasid uAPI proposal

From: Jason Wang
Date: Mon Jun 07 2021 - 21:20:48 EST

在 2021/6/8 上午3:41, Alex Williamson 写道:
On Mon, 7 Jun 2021 16:08:02 -0300
Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:

On Mon, Jun 07, 2021 at 12:59:46PM -0600, Alex Williamson wrote:

It is up to qemu if it wants to proceed or not. There is no issue with
allowing the use of no-snoop and blocking wbinvd, other than some
drivers may malfunction. If the user is certain they don't have
malfunctioning drivers then no issue to go ahead.
A driver that knows how to use the device in a coherent way can
certainly proceed, but I suspect that's not something we can ask of
QEMU. QEMU has no visibility to the in-use driver and sketchy ability
to virtualize the no-snoop enable bit to prevent non-coherent DMA from
the device. There might be an experimental ("x-" prefixed) QEMU device
option to allow user override, but QEMU should disallow the possibility
of malfunctioning drivers by default. If we have devices that probe as
supporting no-snoop, but actually can't generate such traffic, we might
need a quirk list somewhere.
Compatibility is important, but when I look in the kernel code I see
very few places that call wbinvd(). Basically all DRM for something
relavent to qemu.

That tells me that the vast majority of PCI devices do not generate
no-snoop traffic.
Unfortunately, even just looking at devices across a couple laptops
most devices do support and have NoSnoop+ set by default. I don't
notice anything in the kernel that actually tries to set this enable (a
handful that actively disable), so I assume it's done by the firmware.

I wonder whether or not it was done via ACPI:


6.2.17 _CCA (Cache Coherency Attribute) The _CCA object returns whether or not a bus-master device supports hardware managed cache coherency. Expected values are 0 to indicate it is not supported, and 1 to indicate that it is supported. All other values are reserved.


On Intel platforms, if the _CCA object is not supplied, the OSPM will assume the devices are hardware cache coherent.



It's not safe for QEMU to make an assumption that only GPUs will
actually make use of it.

I think it makes the software design much simpler if the security
check is very simple. Possessing a suitable device in an ioasid fd
container is enough to flip on the feature and we don't need to track
changes from that point on. We don't need to revoke wbinvd if the
ioasid fd changes, for instance. Better to keep the kernel very simple
in this regard.
You're suggesting that a user isn't forced to give up wbinvd emulation
if they lose access to their device?
Sure, why do we need to be stricter? It is the same logic I gave
earlier, once an attacker process has access to wbinvd an attacker can
just keep its access indefinitely.

The main use case for revokation assumes that qemu would be
compromised after a device is hot-unplugged and you want to block off
wbinvd. But I have a hard time seeing that as useful enough to justify
all the complicated code to do it...
It's currently just a matter of the kvm-vfio device holding a reference
to the group so that it cannot be used elsewhere so long as it's being
used to elevate privileges on a given KVM instance. If we conclude that
access to a device with the right capability is required to gain a
privilege, I don't really see how we can wave aside that the privilege
isn't lost with the device.

For KVM qemu can turn on/off on hot plug events as it requires to give
VM security. It doesn't need to rely on the kernel to control this.
Yes, QEMU can reject a hot-unplug event, but then QEMU retains the
privilege that the device grants it. Releasing the device and
retaining the privileged gained by it seems wrong. Thanks,