Re: [PATCH v2] docs: security: Confidential computing intro and threat model for x86 virtualization

From: Dmytro Maluka
Date: Fri Jun 16 2023 - 08:36:49 EST


On 6/14/23 16:15, Sean Christopherson wrote:
> On Wed, Jun 14, 2023, Elena Reshetova wrote:
>>>> +The specific details of the CoCo security manager vastly diverge between
>>>> +technologies. For example, in some cases, it will be implemented in HW
>>>> +while in others it may be pure SW. In some cases, such as for the
>>>> +`Protected kernel-based virtual machine (pKVM) <https://github.com/intel-
>>> staging/pKVM-IA>`,
>>>> +the CoCo security manager is a small, isolated and highly privileged
>>>> +(compared to the rest of SW running on the host) part of a traditional
>>>> +VMM.
>>>
>>> I say that "virtualized environments" isn't a good description because
>>> while pKVM does utilize hardware virtualization, my understanding is that
>>> the primary use cases for pKVM don't have the same threat model as SNP/TDX,
>>> e.g. IIUC many (most? all?) pKVM guests don't require network access.
>>
>> Not having a network access requirement doesn’t implicitly invalidate the
>> separation guarantees between the host and guest, it just makes it easier
>> since you have one interface less between the host and guest.
>
> My point is that if the protected guest doesn't need any I/O beyond the hardware
> device that it accesses, then the threat model is different because many of the
> new/novel attack surfaces that come with the TDX/SNP threat model don't exist.
> E.g. the hardening that people want to do for VirtIO drivers may not be at all
> relevant to pKVM.

Strictly speaking, the protected pKVM guest does need some I/O beyond
that, e.g. for some (limited and specialized) communication between the
host and the guest, e.g. vsock-based. For example, in the fingerprint
use case, the guest receives requests from the host to capture
fingerprint data from the sensor, sends encrypted fingerprint templates
to the host, and so on.

Additionally, speaking of the hardware device, the guest does not
entirely own it. It has direct exclusive access to the data
communication with the device (ensured by its exclusive access to MMIO
and DMA buffers), but e.g. the device interrupts are forwarded to the
guest by the host, and the PCI config space is virtualized by the host.

But I think I get what you mean: there is no data transfer whereby the
host is not an endpoint but an intermediary between the guest and some
device. In simple words, things like virtio-net or virtio-blk are out of
scope. Yes, I think that's correct for pKVM-on-x86 use cases (and I
suppose it is correct for pKVM-on-ARM use cases as well). I guess it
means that "guest data attacks" may not be relevant to pKVM, and perhaps
this makes its threat model substantially different from cloud use
cases.

However, other kinds of threats described in the doc do seem to be
relevant to pKVM. "Malformed/malicious runtime input" is relevant since
communication channels between the host and the guest do exist, the host
may arbitrarily inject interrupts into the guest, etc. "Guest malicious
configuration" is relevant too, and guest attestation is required, as I
wrote in [1].

Cc'ing android-kvm and some ChromeOS folks to correct me if needed.

> And I don't see any need to formally document pKVM's threat model right *now*.
> pKVM on x86 is little more than a proposal at this point, and while I would love
> to see documentation for pKVM on ARM's threat model, that obviously doesn't belong
> in a doc that's x86 specific.

Agree, and I don't think it makes sense to mention pKVM-on-x86 without
mentioning pKVM-on-ARM, as if pKVM-on-x86 had more in common with cloud
use cases than with pKVM-on-ARM, while quite the opposite is true.

It seems there is no reason why pKVM-on-x86 threat model should be
different from pKVM-on-ARM. The use cases on ARM (for Android) and on
x86 (for ChromeOS) are somewhat different at this moment (in that in
ChromeOS use cases the protected guest's sensitive data includes also
data coming directly from a physical device), but IIUC they are
converging now, i.e. Android is getting interested in use cases with
physical devices too.

>>>> +potentially misbehaving host (which can also include some part of a
>>>> +traditional VMM or all of it), which is typically placed outside of the
>>>> +CoCo VM TCB due to its large SW attack surface. It is important to note
>>>> +that this doesn’t imply that the host or VMM are intentionally
>>>> +malicious, but that there exists a security value in having a small CoCo
>>>> +VM TCB. This new type of adversary may be viewed as a more powerful type
>>>> +of external attacker, as it resides locally on the same physical machine
>>>> +-in contrast to a remote network attacker- and has control over the guest
>>>> +kernel communication with most of the HW::
>>>
>>> IIUC, this last statement doesn't hold true for the pKVM on x86 use case, which
>>> specifically aims to give a "guest" exclusive access to hardware resources.
>>
>> Does it hold for *all* HW resources? If yes, indeed this would make pKVM on
>> x86 considerably different.
>
> Heh, the original says "most", so it doesn't have to hold for all hardware resources,
> just a simple majority.

Again, pedantic mode on, I find it difficult to agree with the wording
that the guest owns "most of" the HW resources it uses. It controls the
data communication with its hardware device, but other resources (e.g.
CPU time, interrupts, timers, PCI config space, ACPI) are owned by the
host and virtualized by it for the guest.

[1] https://lore.kernel.org/all/2cfa3122-6b54-aab5-8a61-41c08853286b@xxxxxxxxxxxx/