Re: [RFC] /dev/ioasid uAPI proposal
From: Paolo Bonzini
Date: Wed Jun 09 2021 - 11:20:49 EST
On 09/06/21 16:45, Jason Gunthorpe wrote:
On Wed, Jun 09, 2021 at 08:31:34AM -0600, Alex Williamson wrote:
If we go back to the wbinvd ioctl mechanism, if I call that ioctl with
an ioasidfd that contains no devices, then I shouldn't be able to
generate a wbinvd on the processor, right? If I add a device,
especially in a configuration that can generate non-coherent DMA, now
that ioctl should work. If I then remove all devices from that ioasid,
what then is the difference from the initial state. Should the ioctl
now work because it worked once in the past?
The ioctl is fine, but telling KVM to enable WBINVD is very similar to
open and then reconfiguring the ioasid_fd is very similar to
chmod. From a security perspective revoke is not strictly required,
IMHO.
I absolutely do *not* want an API that tells KVM to enable WBINVD. This
is not up for discussion.
But really, let's stop calling the file descriptor a security proof or a
capability. It's overkill; all that we are doing here is kernel
acceleration of the WBINVD ioctl.
As a thought experiment, let's consider what would happen if wbinvd
caused an unconditional exit from guest to userspace. Userspace would
react by invoking the ioctl on the ioasid. The proposed functionality
is just an acceleration of this same thing, avoiding the
guest->KVM->userspace->IOASID->wbinvd trip.
This is why the API that I want, and that is already exists for VFIO
group file descriptors, informs KVM of which "ioctls" the guest should
be able to do via privileged instructions[1]. Then the kernel works out
with KVM how to ensure a 1:1 correspondence between the operation of the
ioctls and the privileged operations.
One way to do it would be to always trap WBINVD and invoke the same
kernel function that implements the ioctl. The function would do either
a wbinvd or nothing, based on whether the ioasid has any device. The
next logical step is a notification mechanism that enables WBINVD (by
disabling the WBINVD intercept) when there are devices in the ioasidfd,
and disables WBINVD (by enabling a no-op intercept) when there are none.
And in fact once all VFIO devices are gone, wbinvd is for all purposes a
no-op as far as the guest kernel can tell. So there's no reason to
treat it as anything but a no-op.
Thanks,
Paolo
[1] As an aside, I must admit I didn't entirely understand the design of
the KVM-VFIO device back when Alex added it. But with this model it was
absolutely the right thing to do, and it remains the right thing to do
even if VFIO groups are replaced with IOASID file descriptors.