Re: [PATCH v1 00/15] Add support for Nitro Enclaves

From: Liran Alon
Date: Mon Apr 27 2020 - 07:44:26 EST



On 27/04/2020 10:56, Paraschiv, Andra-Irina wrote:

On 25/04/2020 18:25, Liran Alon wrote:

On 23/04/2020 16:19, Paraschiv, Andra-Irina wrote:

The memory and CPUs are carved out of the primary VM, they are dedicated for the enclave. The Nitro hypervisor running on the host ensures memory and CPU isolation between the primary VM and the enclave VM.
I hope you properly take into consideration Hyper-Threading speculative side-channel vulnerabilities here.
i.e. Usually cloud providers designate each CPU core to be assigned to run only vCPUs of specific guest. To avoid sharing a single CPU core between multiple guests.
To handle this properly, you need to use some kind of core-scheduling mechanism (Such that each CPU core either runs only vCPUs of enclave or only vCPUs of primary VM at any given point in time).

In addition, can you elaborate more on how the enclave memory is carved out of the primary VM?
Does this involve performing a memory hot-unplug operation from primary VM or just unmap enclave-assigned guest physical pages from primary VM's SLAT (EPT/NPT) and map them now only in enclave's SLAT?

Correct, we take into consideration the HT setup. The enclave gets dedicated physical cores. The primary VM and the enclave VM don't run on CPU siblings of a physical core.
The way I would imagine this to work is that Primary-VM just specifies how many vCPUs will the Enclave-VM have and those vCPUs will be set with affinity to run on same physical CPU cores as Primary-VM.
But with the exception that scheduler is modified to not run vCPUs of Primary-VM and Enclave-VM as sibling on the same physical CPU core (core-scheduling). i.e. This is different than primary-VM losing
physical CPU cores permanently as long as the Enclave-VM is running.
Or maybe this should even be controlled by a knob in virtual PCI device interface to allow flexibility to customer to decide if Enclave-VM needs dedicated CPU cores or is it ok to share them with Primary-VM
as long as core-scheduling is used to guarantee proper isolation.

Regarding the memory carve out, the logic includes page table entries handling.
As I thought. Thanks for conformation.

IIRC, memory hot-unplug can be used for the memory blocks that were previously hot-plugged.

https://urldefense.com/v3/__https://www.kernel.org/doc/html/latest/admin-guide/mm/memory-hotplug.html__;!!GqivPVa7Brio!MubgaBjJabDtNzNpdOxxbSKtLbqXHbsEpTtZ1mj-rnfLvMIbLW1nZ8cK10GhYJQ$

I don't quite understand why Enclave VM needs to be provisioned/teardown during primary VM's runtime.

For example, an alternative could have been to just provision both primary VM and Enclave VM on primary VM startup.
Then, wait for primary VM to setup a communication channel with Enclave VM (E.g. via virtio-vsock).
Then, primary VM is free to request Enclave VM to perform various tasks when required on the isolated environment.

Such setup will mimic a common Enclave setup. Such as Microsoft Windows VBS EPT-based Enclaves (That all runs on VTL1). It is also similar to TEEs running on ARM TrustZone.
i.e. In my alternative proposed solution, the Enclave VM is similar to VTL1/TrustZone.
It will also avoid requiring introducing a new PCI device and driver.

True, this can be another option, to provision the primary VM and the enclave VM at launch time.

In the proposed setup, the primary VM starts with the initial allocated resources (memory, CPUs). The launch path of the enclave VM, as it's spawned on the same host, is done via the ioctl interface - PCI device - host hypervisor path. Short-running or long-running enclave can be bootstrapped during primary VM lifetime. Depending on the use case, a custom set of resources (memory and CPUs) is set for an enclave and then given back when the enclave is terminated; these resources can be used for another enclave spawned later on or the primary VM tasks.

Yes, I already understood this is how the mechanism work. I'm questioning whether this is indeed a good approach that should also be taken by upstream.

The use-case of using Nitro Enclaves is for a Confidential-Computing service. i.e. The ability to provision a compute instance that can be trusted to perform a bunch of computation on sensitive
information with high confidence that it cannot be compromised as it's highly isolated. Some technologies such as Intel SGX and AMD SEV attempted to achieve this even with guarantees that
the computation is isolated from the hardware and hypervisor itself.

I would have expected that for the vast majority of real customer use-cases, the customer will provision a compute instance that runs some confidential-computing task in an enclave which it
keeps running for the entire life-time of the compute instance. As the sole purpose of the compute instance is to just expose a service that performs some confidential-computing task.
For those cases, it should have been sufficient to just pre-provision a single Enclave-VM that performs this task, together with the compute instance and connect them via virtio-vsock.
Without introducing any new virtual PCI device, guest PCI driver and unique semantics of stealing resources (CPUs and Memory) from primary-VM at runtime.

In this Nitro Enclave architecture, we de-facto put Compute control-plane abilities in the hands of the guest VM. Instead of introducing new control-plane primitives that allows building
the data-plane architecture desired by the customer in a flexible manner.
* What if the customer prefers to have it's Enclave VM polling S3 bucket for new tasks and produce results to S3 as-well? Without having any "Primary-VM" or virtio-vsock connection of any kind?
* What if for some use-cases customer wants Enclave-VM to have dedicated compute power (i.e. Not share physical CPU cores with primary-VM. Not even with core-scheduling) but for other
use-cases, customer prefers to share physical CPU cores with Primary-VM (Together with core-scheduling guarantees)? (Although this could be addressed by extending the virtual PCI device
interface with a knob to control this)

An alternative would have been to have the following new control-plane primitives:
* Ability to provision a VM without boot-volume, but instead from an Image that is used to boot from memory. Allowing to provision disk-less VMs.
 (E.g. Can be useful for other use-cases such as VMs not requiring EBS at all which could allow cheaper compute instance)
* Ability to provision a group of VMs together as a group such that they are guaranteed to launch as sibling VMs on the same host.
* Ability to create a fast-path connection between sibling VMs on the same host with virtio-vsock. Or even also other shared-memory mechanism.
* Extend AWS Fargate with ability to run multiple microVMs as a group (Similar to above) connected with virtio-vsock. To allow on-demand scale of confidential-computing task.

Having said that, I do see a similar architecture to Nitro Enclaves virtual PCI device used for a different purpose: For hypervisor-based security isolation (Such as Windows VBS).
E.g. Linux boot-loader can detect the presence of this virtual PCI device and use it to provision multiple VM security domains. Such that when a security domain is created,
it is specified what is the hardware resources it have access to (Guest memory pages, IOPorts, MSRs and etc.) and the blob it should run to bootstrap. Similar, but superior than,
Hyper-V VSM. In addition, some security domains will be given special abilities to control other security domains (For example, to control the +XS,+XU EPT bits of other security
domains to enforce code-integrity. Similar to Windows VBS HVCI). Just an idea... :)

-Liran