Re: [PATCH v1 00/15] Add support for Nitro Enclaves

From: Alexander Graf
Date: Tue Apr 28 2020 - 11:25:58 EST




On 27.04.20 13:44, Liran Alon wrote:

On 27/04/2020 10:56, Paraschiv, Andra-Irina wrote:

On 25/04/2020 18:25, Liran Alon wrote:

On 23/04/2020 16:19, Paraschiv, Andra-Irina wrote:

The memory and CPUs are carved out of the primary VM, they are
dedicated for the enclave. The Nitro hypervisor running on the host
ensures memory and CPU isolation between the primary VM and the
enclave VM.
I hope you properly take into consideration Hyper-Threading
speculative side-channel vulnerabilities here.
i.e. Usually cloud providers designate each CPU core to be assigned
to run only vCPUs of specific guest. To avoid sharing a single CPU
core between multiple guests.
To handle this properly, you need to use some kind of core-scheduling
mechanism (Such that each CPU core either runs only vCPUs of enclave
or only vCPUs of primary VM at any given point in time).

In addition, can you elaborate more on how the enclave memory is
carved out of the primary VM?
Does this involve performing a memory hot-unplug operation from
primary VM or just unmap enclave-assigned guest physical pages from
primary VM's SLAT (EPT/NPT) and map them now only in enclave's SLAT?

Correct, we take into consideration the HT setup. The enclave gets
dedicated physical cores. The primary VM and the enclave VM don't run
on CPU siblings of a physical core.
The way I would imagine this to work is that Primary-VM just specifies
how many vCPUs will the Enclave-VM have and those vCPUs will be set with
affinity to run on same physical CPU cores as Primary-VM.
But with the exception that scheduler is modified to not run vCPUs of
Primary-VM and Enclave-VM as sibling on the same physical CPU core
(core-scheduling). i.e. This is different than primary-VM losing
physical CPU cores permanently as long as the Enclave-VM is running.
Or maybe this should even be controlled by a knob in virtual PCI device
interface to allow flexibility to customer to decide if Enclave-VM needs
dedicated CPU cores or is it ok to share them with Primary-VM
as long as core-scheduling is used to guarantee proper isolation.

Running both parent and enclave on the same core can *potentially* lead to L2 cache leakage, so we decided not to go with it :).


Regarding the memory carve out, the logic includes page table entries
handling.
As I thought. Thanks for conformation.

IIRC, memory hot-unplug can be used for the memory blocks that were
previously hot-plugged.

https://urldefense.com/v3/__https://www.kernel.org/doc/html/latest/admin-guide/mm/memory-hotplug.html__;!!GqivPVa7Brio!MubgaBjJabDtNzNpdOxxbSKtLbqXHbsEpTtZ1mj-rnfLvMIbLW1nZ8cK10GhYJQ$


I don't quite understand why Enclave VM needs to be
provisioned/teardown during primary VM's runtime.

For example, an alternative could have been to just provision both
primary VM and Enclave VM on primary VM startup.
Then, wait for primary VM to setup a communication channel with
Enclave VM (E.g. via virtio-vsock).
Then, primary VM is free to request Enclave VM to perform various
tasks when required on the isolated environment.

Such setup will mimic a common Enclave setup. Such as Microsoft
Windows VBS EPT-based Enclaves (That all runs on VTL1). It is also
similar to TEEs running on ARM TrustZone.
i.e. In my alternative proposed solution, the Enclave VM is similar
to VTL1/TrustZone.
It will also avoid requiring introducing a new PCI device and driver.

True, this can be another option, to provision the primary VM and the
enclave VM at launch time.

In the proposed setup, the primary VM starts with the initial
allocated resources (memory, CPUs). The launch path of the enclave VM,
as it's spawned on the same host, is done via the ioctl interface -
PCI device - host hypervisor path. Short-running or long-running
enclave can be bootstrapped during primary VM lifetime. Depending on
the use case, a custom set of resources (memory and CPUs) is set for
an enclave and then given back when the enclave is terminated; these
resources can be used for another enclave spawned later on or the
primary VM tasks.

Yes, I already understood this is how the mechanism work. I'm
questioning whether this is indeed a good approach that should also be
taken by upstream.

I thought the point of Linux was to support devices that exist, rather than change the way the world works around it? ;)

The use-case of using Nitro Enclaves is for a Confidential-Computing
service. i.e. The ability to provision a compute instance that can be
trusted to perform a bunch of computation on sensitive
information with high confidence that it cannot be compromised as it's
highly isolated. Some technologies such as Intel SGX and AMD SEV
attempted to achieve this even with guarantees that
the computation is isolated from the hardware and hypervisor itself.

Yeah, that worked really well, didn't it? ;)

I would have expected that for the vast majority of real customer
use-cases, the customer will provision a compute instance that runs some
confidential-computing task in an enclave which it
keeps running for the entire life-time of the compute instance. As the
sole purpose of the compute instance is to just expose a service that
performs some confidential-computing task.
For those cases, it should have been sufficient to just pre-provision a
single Enclave-VM that performs this task, together with the compute
instance and connect them via virtio-vsock.
Without introducing any new virtual PCI device, guest PCI driver and
unique semantics of stealing resources (CPUs and Memory) from primary-VM
at runtime.

You would also need to preprovision the image that runs in the enclave, which is usually only determined at runtime. For that you need the PCI driver anyway, so why not make the creation dynamic too?

In this Nitro Enclave architecture, we de-facto put Compute
control-plane abilities in the hands of the guest VM. Instead of
introducing new control-plane primitives that allows building
the data-plane architecture desired by the customer in a flexible manner.
* What if the customer prefers to have it's Enclave VM polling S3 bucket
for new tasks and produce results to S3 as-well? Without having any
"Primary-VM" or virtio-vsock connection of any kind?
* What if for some use-cases customer wants Enclave-VM to have dedicated
compute power (i.e. Not share physical CPU cores with primary-VM. Not
even with core-scheduling) but for other
use-cases, customer prefers to share physical CPU cores with Primary-VM
(Together with core-scheduling guarantees)? (Although this could be
addressed by extending the virtual PCI device
interface with a knob to control this)

An alternative would have been to have the following new control-plane
primitives:
* Ability to provision a VM without boot-volume, but instead from an
Image that is used to boot from memory. Allowing to provision disk-less VMs.
 (E.g. Can be useful for other use-cases such as VMs not requiring EBS
at all which could allow cheaper compute instance)
* Ability to provision a group of VMs together as a group such that they
are guaranteed to launch as sibling VMs on the same host.
* Ability to create a fast-path connection between sibling VMs on the
same host with virtio-vsock. Or even also other shared-memory mechanism.
* Extend AWS Fargate with ability to run multiple microVMs as a group
(Similar to above) connected with virtio-vsock. To allow on-demand scale
of confidential-computing task.

Yes, there are a *lot* of different ways to implement enclaves in a cloud environment. This is the one that we focused on, but I'm sure others in the space will have more ideas. It's definitely an interesting space and I'm eager to see more innovation happening :).

Having said that, I do see a similar architecture to Nitro Enclaves
virtual PCI device used for a different purpose: For hypervisor-based
security isolation (Such as Windows VBS).
E.g. Linux boot-loader can detect the presence of this virtual PCI
device and use it to provision multiple VM security domains. Such that
when a security domain is created,
it is specified what is the hardware resources it have access to (Guest
memory pages, IOPorts, MSRs and etc.) and the blob it should run to
bootstrap. Similar, but superior than,
Hyper-V VSM. In addition, some security domains will be given special
abilities to control other security domains (For example, to control the
+XS,+XU EPT bits of other security
domains to enforce code-integrity. Similar to Windows VBS HVCI). Just an
idea... :)

Yes, absolutely! So much fun to be had :D


Alex



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879