Re: [RFC 00/16] KVM protected memory extension

From: Liran Alon
Date: Mon May 25 2020 - 12:00:45 EST



On 25/05/2020 17:46, Kirill A. Shutemov wrote:
On Mon, May 25, 2020 at 04:47:18PM +0300, Liran Alon wrote:
On 22/05/2020 15:51, Kirill A. Shutemov wrote:
== Background / Problem ==

There are a number of hardware features (MKTME, SEV) which protect guest
memory from some unauthorized host access. The patchset proposes a purely
software feature that mitigates some of the same host-side read-only
attacks.


== What does this set mitigate? ==

- Host kernel âaccidentalâ access to guest data (think speculation)
Just to clarify: This is any host kernel memory info-leak vulnerability. Not
just speculative execution memory info-leaks. Also architectural ones.

In addition, note that removing guest data from host kernel VA space also
makes guest<->host memory exploits more difficult.
E.g. Guest cannot use already available memory buffer in kernel VA space for
ROP or placing valuable guest-controlled code/data in general.

- Host kernel induced access to guest data (write(fd, &guest_data_ptr, len))

- Host userspace access to guest data (compromised qemu)
I don't quite understand what is the benefit of preventing userspace VMM
access to guest data while the host kernel can still access it.
Let me clarify: the guest memory mapped into host userspace is not
accessible by both host kernel and userspace. Host still has way to access
it via a new interface: GUP(FOLL_KVM). The GUP will give you struct page
that kernel has to map (temporarily) if need to access the data. So only
blessed codepaths would know how to deal with the memory.
Yes, I understood that. I meant explicit host kernel access.

It can help preventing some host->guest attack on the compromised host.
Like if an VM has successfully attacked the host it cannot attack other
VMs as easy.

We have mechanisms to sandbox the userspace VMM process for that.

You need to be more specific on what is the attack scenario you attempt to address
here that is not covered by existing mechanisms. i.e. Be crystal clear on the extra value
of the feature of not exposing guest data to userspace VMM.


It would also help to protect against guest->host attack by removing one
more places where the guest's data is mapped on the host.
Because guest have explicit interface to request which guest pages can be mapped in userspace VMM, the value of this is very small.

Guest already have ability to map guest controlled code/data in userspace VMM either via this interface or via forcing userspace VMM
to create various objects during device emulation handling. The only extra property this patch-series provides, is that only a
small portion of guest pages will be mapped to host userspace instead of all of it. Resulting in smaller regions for exploits that require
guessing a virtual address. But: (a) Userspace VMM device emulation may still allow guest to spray userspace heap with objects containing
guest controlled data. (b) How is userspace VMM suppose to limit which guest pages should not be mapped to userspace VMM even though guest have
explicitly requested them to be mapped? (E.g. Because they are valid DMA sources/targets for virtual devices or because it's vGPU frame-buffer).
QEMU is more easily compromised than the host kernel because it's
guest<->host attack surface is larger (E.g. Various device emulation).
But this compromise comes from the guest itself. Not other guests. In
contrast to host kernel attack surface, which an info-leak there can
be exploited from one guest to leak another guest data.
Consider the case when unprivileged guest user exploits bug in a QEMU
device emulation to gain access to data it cannot normally have access
within the guest. With the feature it would able to see only other shared
regions of guest memory such as DMA and IO buffers, but not the rest.
This is a scenario where an unpriviledged guest userspace have direct access to a virtual device
and is able to exploit a bug in device emulation handling such that it will allow it to compromise
the security *inside* the guest. i.e. Leak guest kernel data or other guest userspace processes data.

That's true. Good point. This is a very important missing argument from the cover-letter.

Now it's crystal clear on the trade-off considered here:
Is the extra complication and perf cost provided by the mechanism of this patch-series worth
to protect against the scenario of a userspace VMM vulnerability that may be accessible to unpriviledged
guest userspace process to leak other *in-guest* data that is not otherwise accessible to that process?

-Liran