[PATCH v21 26/28] docs: x86/sgx: Add Architecture documentation
From: Jarkko Sakkinen
Date: Sat Jul 13 2019 - 13:13:18 EST
From: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>
Document microarchitectural features of SGX relevant to the kernel.
They are documented in detail enough to understand the implementation.
Signed-off-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>
---
Documentation/x86/index.rst | 1 +
Documentation/x86/sgx/1.Architecture.rst | 431 +++++++++++++++++++++++
Documentation/x86/sgx/index.rst | 16 +
3 files changed, 448 insertions(+)
create mode 100644 Documentation/x86/sgx/1.Architecture.rst
create mode 100644 Documentation/x86/sgx/index.rst
diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst
index f2de1b2d3ac7..dadcd3691e06 100644
--- a/Documentation/x86/index.rst
+++ b/Documentation/x86/index.rst
@@ -28,3 +28,4 @@ x86-specific Documentation
usb-legacy-support
i386/index
x86_64/index
+ sgx/index
diff --git a/Documentation/x86/sgx/1.Architecture.rst b/Documentation/x86/sgx/1.Architecture.rst
new file mode 100644
index 000000000000..a4de6c610231
--- /dev/null
+++ b/Documentation/x86/sgx/1.Architecture.rst
@@ -0,0 +1,431 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+============
+Architecture
+============
+
+Introduction
+============
+
+SGX is a set of instructions and mechanisms that enable ring 3 applications to
+set aside private regions of code and data for the purpose of establishing and
+running enclaves. An enclave is a secure entity whose private memory can only
+be accessed by code running within the enclave. Accesses from outside the
+enclave, including software running at a higher privilege level and other
+enclaves, are disallowed by hardware.
+
+SGX also provides for local and remote attestation. `Attestation`_ allows an
+enclave to attest its identity, that it has not been tampered with, that it is
+running on a genuine platform with Intel SGX enabled, and the security
+properties of the platform on which it is running.
+
+You can determine if your CPU supports SGX by querying ``/proc/cpuinfo``:
+
+ ``cat /proc/cpuinfo | grep sgx``
+
+
+Enclave Page Cache
+==================
+
+SGX utilizes an Enclave Page Cache (EPC) to store pages that are associated
+with an enclave. The EPC is secure storage whose exact physical implementation
+is micro-architecture specific (see `EPC Implemenations`_). Similar to normal
+system memory, the EPC is managed by privileged software using conventional
+paging mechanisms, e.g. the kernel can grant/deny access to EPC memory by
+manipulating a process' page tables, and can swap pages in/out of the EPC in
+order to oversubscribe the EPC.
+
+Unlikely regular memory, hardware prevents arbitrary insertion, eviction,
+deletion, access, etc... to/from the EPC. Software must instead use dedicated
+`SGX instructions`_ to operate on the EPC, which enables the processor to
+provide SGX's security guarantees by enforcing various restrictions and
+behaviors, e.g. limits concurrent accesses to EPC pages and ensures proper TLB
+flushing when moving pages in/out of the EPC.
+
+Accesses to EPC pages are allowed if and only if the access is classified as an
+"enclave access". There are two categories of allowed enclave accesses: direct
+and indirect. Direct enclave accesses are generated if and only the processor
+is executing in Enclave Mode (see `Enclave execution`_). Indirect enclave
+accesses are generated by various ENCL{S,U,V} functions, many of which can be
+executed outside of Enclave Mode.
+
+Non-enclave accesses to the EPC result in undefined behavior. Conversely,
+enclave accesses to non-EPC memory result in a page fault (#PF)[1]_. Page
+faults due to invalid enclave accesses set the PF_SGX flag (bit 15) in the page
+fault error code[2]_.
+
+Although all EPC implementations will undoubtedly encrypt the EPC itself, all
+all EPC code/data is stored unencrypted in the processor's caches. I.e. SGX
+relies on the aforementioned mechanisms to protect an enclave's secrets while
+they are resident in the cache.
+
+Note, EPC pages are always 4KB sized and aligned. Software can map EPC using
+using large pages, but the processor always operates on a 4KB granularity when
+working with EPC pages.
+
+
+SGX instructions
+================
+
+SGX introduces three new instructions, ENCLS, ENCLU and ENCLV, for Supervisor,
+User and Virtualization respectively. ENCL{S,U,V} are umbrella instructions,
+using a single opcode as the front end to a variety of SGX functions. The leaf
+function to execute is specified via %eax, with %rbx, %rcx and %rdx optionally
+used for leaf-specific purposes.
+
+Note that supervisor software, i.e. the kernel, creates and manages enclaves,
+but only user-level software can execute/enter an enclave.
+
+ENCLS Leafs
+-----------
+
+ - ECREATE: create an enclave
+ - EADD: add page to an uninitialized enclave
+ - EAUG: add page to an initialized enclave
+ - EEXTEND: extended the measurement of an (uninitialized) enclave
+ - EINIT: verify and initialize enclave
+ - EDBG{RD,WR}: read/write from/to a debug enclave’s memory
+ - EMODPR: restrict an EPC page’s permissions
+ - EMODT: modify an EPC page’s type
+ - EBLOCK: mark a page as blocked in EPCM
+ - ETRACK{C}: activate blocking tracing
+ - EWB: write back page from EPC to regular memory
+ - ELD{B,U}{C}: load page in {un}blocked state from system memory to EPC
+ - EPA: add version array (use to track evicted EPC pages)
+ - EREMOVE: remove a page from EPC
+ - ERDINFO: retrieve info about an EPC page from EPCM
+
+ENCLU Leafs
+-----------
+ - EENTER: enter an enclave
+ - ERESUME: resume execution of an interrupted enclave
+ - EEXIT: exit an enclave
+ - EGETKEY: retrieve a cryptographic key from the processor
+ - EREPORT: generate a cryptographic report describing an enclave
+ - EMODPE: extend an EPC page's permissions
+ - EACCEPT: accept changes to an EPC page
+ - EACCEPTCOPY: copy an existing EPC page to an uninitialized EPC page
+
+ENCLV Leafs
+-----------
+ - E{DEC,INC}VIRTCHILD: {dec,inc}rement SECS virtual refcount
+ - ESETCONTEXT: set SECS’ context pointer
+
+
+EPC page types
+==============
+
+All pages in the EPC have an explicit page type identifying the type of page.
+The type of page affects the page's accessibility, concurrency requirements,
+lifecycle, etc...
+
+SGX Enclave Control Structure (SECS)
+ An enclave is defined and referenced by an SGX Enclave Control Structure.
+ When creating an enclave (via ECREATE), software provides a source SECS for
+ the enclave, which is copied into a target EPC page. The source SECS
+ contains security and measurement information, as well as attributes and
+ properties of the enclave. Once the SECS is copied into the EPC, it's used
+ by the processor to store enclave metadata, e.g. the number of EPC pages
+ associated with the enclave, and is no longer directly accessible by
+ software.
+
+Regular (REG)
+ Regular EPC pages contain the code and data of an enclave. Code and data
+ pages can be added to an uninitialized enclave (prior to EINIT) via EADD.
+ Post EINIT, pages can be added to an enclave via EAUG. Pages added via
+ EAUG must be explicitly accepted by the enclave via EACCEPT or EACCEPTCOPY.
+
+Thread Control Structure (TCS)
+ Thread Control Structure pages define the entry points to an enclave and
+ track the execution state of an enclave thread. A TCS can only be used by
+ a single logical CPU at any given time, but otherwise has no attachment to
+ any particular logical CPU. Like regular pages, TCS pages are added to
+ enclaves via EADD and EINIT.
+
+Version Array (VA)
+ Version Array pages contain 512 slots, each of which can contain a version
+ number for a page evicted from the EPC. A version number is a unique 8-byte
+ value that is fed into the MAC computation used to verify the contents of an
+ evicted page when reloading said page into the EPC. VA pages are the only
+ page type not directly associated with an enclave, and are allocated in the
+ EPC via EPA. Note that VA pages can also be evicted from the EPC, but
+ doing so requires another VA page/slot to hold the version number of the VA
+ page being evicted.
+
+Trim (TRIM)
+ The Trim page type indicates that a page has been trimmed from the enclave’s
+ address space and is no longer accessible to enclave software, i.e. is about
+ to be removed from the enclave (via EREMOVE). Removing pages from a running
+ enclaves requires the enclave to explicit accept the removal (via EACCEPT).
+ The intermediate Trim type allows software to batch deallocation operations
+ to improve efficiency, e.g. minimize transitions between userspace, enclave
+ and kernel.
+
+
+Enclave Page Cache Map
+======================
+
+The processor tracks EPC pages via the Enclave Page Cache Map (EPCM). The EPCM
+is a processor-managed structure that enforces access restrictions to EPC pages
+in addition to the software-managed page tables. The EPCM contains one entry
+per EPC page, and although the details are implementation specific, all
+implementations contain the following architectural information:
+
+ - The status of EPC page with respect to validity and accessibility.
+ - An SECS identifier of the enclave to which the page belongs.
+ - The type of page: regular, SECS, TCS, VA or TRIM
+ - The linear address through which the enclave is allowed to access the page.
+ - The specified read/write/execute permissions on that page.
+
+Access violations, e.g. insufficient permissions or incorrect linear address,
+detected via the EPCM result in a page fault (#PF)[1]_ exception being signaled
+by the processor. Page faults due to EPCM violations set the PF_SGX flag
+(bit 15) in the page fault error code[2]_.
+
+The EPCM is consulted if and only if walking the software-managed page tables,
+i.e. the kernel's page tables, succeeds. I.e. the effective permissions for an
+EPC page are a logical AND of the kernel's page tables and the corresponding
+EPCM entry. This allows the kernel to make its page tables more restrictive
+without triggering an EPCM violation, e.g. it may mark an entry as not-present
+prior to evicting a page from the EPC.
+
+**IMPORTANT** For all intents and purposes the SGX architecture allows the
+processor to invalidate all EPCM entries at will, i.e. requires that software
+be prepared to handle an EPCM fault at any time. Most processors are expected
+to implement the EPC{M} as a subset of system DRAM that is encrypted with an
+ephemeral key, i.e. a key that is randomly generated at processor reset. As a
+result of using an ephemeral key, the contents of the EPC{M} are lost when the
+processor is powered down as part of an S3 transition or when a virtual machine
+is live migrated to a new physical system.
+
+
+Enclave initialization
+======================
+
+Because software cannot directly access the EPC except when executing in an
+enclave, an enclave must be built using ENCLS functions (ECREATE and EADD) as
+opposed to simply copying the enclave from the filesystem to memory. Once an
+enclave is built, it must be initialized (via EINIT) before userspace can enter
+the enclave and begin `Enclave execution`_.
+
+During the enclave build process, two "measurements", i.e. SHA-256 hashes, are
+taken of the enclave: MRENCLAVE and MRSIGNER. MRENCLAVE measures the enclave's
+contents, e.g. code/data explicitly added to the measurement (via EEXTEND), as
+well as metadata from the enclave's build process, e.g. pages offsets (relative
+to the enclave's base) and page permissions of all pages added to the enclave
+(via EADD). MRENCLAVE is initialized by ECREATE and finalized by EINIT.
+MRSIGNER is simply the SHA-256 hash of the public key used to sign the enclave.
+
+EINIT accepts two parameters in addition to the SECS of the target enclave: an
+Enclave Signature Struct (SIGSTRUCT) and an EINIT token (EINITTOKEN).
+SIGSTRUCT is a structure created and signed by the enclave's developer. Among
+other fields, SIGSTRUCT contains the expected MRENCLAVE of the enclave and the
+MRSIGNER of the enclave. SIGSTRUCT's MRENCLAVE is used by the processor to
+verify that the enclave was properly built (at runtime), and its SIGSTRUCT is
+copied to the SECS upon successful EINIT. EINITTOKEN is an optional parameter
+that is consumed as part of `Launch Control`_.
+
+
+Enclave execution
+=================
+
+Enclaves execute in a bespoke sub-mode of ring 3, appropriately named Enclave
+Mode. Enclave Mode changes behavior in key ways to support SGX's security
+guarantees and to reduce the probability of unintentional disclosure of
+sensitive data.
+
+A notable cornerstone of Enclave Mode is the Enclave Linear Range (ELRANGE).
+An enclave is associated with one, and only one, contiguous linear address
+range, its ELRANGE. The ELRANGE is specified via the SIZE and BASEADDR fields
+in the SECS (provided to ECREATE). The processor queries the active enclave's
+ELRANGE to differentiate enclave and non-enclave accesses, i.e. accesses that
+originate in Enclave Mode *and* whose linear address falls within ELRANGE are
+considered (direct) enclave accesses. Note, the processor also generates
+(indirect) enclave accesses when executing ENCL* instructions, which may occur
+outside of Enclave Mode, e.g. when copying the SECS to its target EPC page
+during ECREATE.
+
+Enclave Mode changes include, but are not limited to:
+
+ - Permits direct software access to EPC pages owned by the enclave
+ - Ensures enclave accesses map to the EPC (EPCM violation, i.e. #PF w/ PF_SGX)
+ - Prevents executing code outside the enclave's ELRANGE (#GP fault)
+ - Changes the behavior of exceptions/events
+ - Causes many instructions to become illegal, i.e. generate an exception
+ - Supresses all instruction breakpoints*
+ - Suppresses data breakpoints within enclave's ELRANGE*
+
+ * For non-debug enclaves.
+
+Transitions to/from Enclave Mode have semantics that are a lovely blend of
+SYSCALL, SYSRET and VM-Exit. In normal execution, entering and exiting Enclave
+Mode can only be done through EENTER and EEXIT respectively. EENTER+EEXIT is
+analogous to SYSCALL+SYSRET, e.g. EENTER/SYSCALL load RCX with the next RIP and
+EEXIT/SYSRET load RIP from R{B,C}X, and EENTER can only jump to a predefined
+location controlled by the enclave/kernel.
+
+But when an exception, interrupt, VM-Exit, etc... occurs, enclave transitions
+behave more like VM-Exit and VMRESUME. To maintain the black box nature of the
+enclave, the processor automatically switches register context when any of the
+aforementioned events occur (the SDM refers to such events as Enclave Exiting
+Events (EEE)).
+
+To handle an EEE, the processor performs an Asynchronous Enclave Exits (AEX).
+Note, although exceptions and traps are synchronous from a processor execution
+perspective, the are asynchronous from the enclave's perspective as the enclave
+is not provided an opportunity to save/fuzz state prior to exiting the enclave.
+On an AEX, the processor exits the enclave to a predefined %rip called the
+Asynchronous Exiting Pointer (AEP). The AEP is specified at enclave entry (via
+EENTER/ERESUME) and saved into the associated TCS, similar to how a hypervisor
+specifies the VM-Exit target (via VMCS.HOST_RIP at VMLAUNCH/VMRESUME), i.e. the
+the AEP is an exit location controlled by the enclave's untrusted runtime.
+
+On an AEX, the processor fully exits the enclave prior to vectoring the event,
+i.e. from the event handler's perspective the event occurred at the AEP. Thus,
+IRET/RSM/VMRESUME (from the event handler) returns control to the enclave's
+untrusted runtime, which can take appropriate action, e.g. immediately ERESUME
+the enclave on interrupts, forward expected exceptions to the enclave, restart
+the enclave on fatal exceptions, and so on and so forth.
+
+To preserve the enclave's state across AEX events, the processor automatically
+saves architectural into a State Save Area (SSA). Because SGX supports nested
+AEX events, e.g. the untrusted runtime can re-EENTER the enclave after an AEX,
+which can in turn trigger an AEX, the TCS holds a pointer to a stack of SSA
+frames (as opposed to a single SSA), an index to the current SSA frame and the
+total number of available frames. When an AEX occurs, the processor saves the
+architectural state into the TCS's current SSA frame. The untrusted runtime
+can then pop the last SSA frame (off the TCS's stack) via ERESUME, i.e. restart
+the enclave after the AEX is handled.
+
+
+Launch Control
+==============
+
+SGX provides a set of controls, referred to as Launch Control, that governs the
+initialization of enclaves. The processor internally stores a SHA-256 hash of
+a 3072-bit RSA public key, i.e. a MRSIGNER, often referred to as the "LE pubkey
+hash". The LE pubkey hash is used during EINIT to prevent launching an enclave
+without proper authorization. In order for EINIT to succeed, the enclave's
+MRSIGNER (from SIGSTRUCT) *or* the MRSIGNER of the enclave's EINITTOKEN must
+match the LE pubkey hash.
+
+An EINITTOKEN can only be created by a so called Launch Enclave (LE). A LE is
+an enclave with SECS.ATTRIBUTES.EINITTOKEN_KEY=1, which grants it access to the
+EINITTOKEN_KEY (retrieved via EGETKEY). EINITTOKENs provide a ready-built
+mechanism for userspace to bless enclaves without requiring additional kernel
+infrastructure.
+
+Processors that support SGX Launch Control Configuration, enumerated by the
+SGX_LC flag (bit 30 in CPUID 0x7.0x0.ECX), expose the LE pubkey hash as a set
+of four MSRs, aptly named IA32_SGXLEPUBKEYHASH[0-3]. The reset value of the
+MSRs is an internally defined (Intel) key (processors that don't support
+SGX_LC also use an internally defined key, it's just not exposed to software).
+
+While the IA32_SGXLEPUBKEYHASH MSRs are readable on any platform that supports
+SGX_LC, the MSRs are only writable if the IA32_FEATURE_CONTROL is locked with
+bit 17 ("SGX Launch Control Enable" per the SDM, or more accurately "SGX LE
+pubkey hash writable") set to '1'. Note, the MSRs are also writable prior to
+`SGX activation`_.
+
+Note, while "Launch Control Configuration" is the official feature name used by
+the Intel SDM, other documentation may use the term "Flexible Launch Control",
+or even simply "Launch Control". Colloquially, the vast majority of usage of
+the term "Launch Control" is synonymous with "Launch Control Configuration".
+
+
+EPC oversubscription
+====================
+
+SGX supports the concept of EPC oversubscription. Analogous to swapping system
+DRAM to disk, enclave pages can be swapped from the EPC to memory, and later
+reloaded from memory to the EPC. But because the kernel is untrusted, swapping
+pages in/out of the EPC has specialized requirements:
+
+ - The kernel cannot directly access EPC memory, i.e. cannot copy data to/from
+ the EPC.
+ - The kernel must "prove" to hardware that there are no valid TLB entries for
+ said page prior to eviction (a stale TLB entry would allow an attacker to
+ bypass SGX access controls).
+ - When loading a page back into the EPC, hardware must be able to verify
+ the integrity and freshness of the data.
+ - When loading an enclave page, e.g. regular and TCS pages, hardware must be
+ able to associate the page with an SECS, i.e. refcount an enclaves pages.
+
+To satisfy the above requirements, the CPU provides dedicated ENCLS functions
+to support paging data in/out of the EPC:
+
+ - EBLOCK: Mark a page as blocked in the EPC Map (EPCM). Attempting to access
+ a blocked page that misses the TLB will fault.
+ - ETRACK: Activate TLB tracking. Hardware verifies that all translations for
+ pages marked as "blocked" have been flushed from the TLB.
+ - EPA: Add Version Array page to the EPC (see `EPC page types`_)
+ - EWB: Write back a page from EPC to memory, e.g. RAM. Software must
+ supply a VA slot, memory to hold the Paging Crypto Metadata (PCMD) of the
+ page and obviously backing for the evicted page.
+ - ELD*: Load a page in {un}blocked state from memory to EPC.
+
+Swapped EPC pages are {de,en}crypted on their way in/out of the EPC, e.g. EWB
+encrypts and ELDU decrypts. The version number (stored in a VA page) and PCMD
+structure associated with an evicted EPC page seal a page (prevent undetected
+modification) and ensure its freshness (prevent rollback to a stale version of
+the page) while the page resides in unprotected storage, e.g. memory or disk.
+
+
+Attestation
+===========
+
+SGX provides mechanisms that allow software to implement what Intel refers to
+as Local Attestation (used by enclaves running on a the same physical platform
+to securely identify one another) and Remote Attestation (a process by which an
+enclave attests itself to a remote entity in order to gain the trust of said
+entity).
+
+The details of Local Attestation and Remote Attestation are far beyond the
+scope of this document. Please see Intel's Software Developer's Manual and/or
+use your search engine of choice to learn more about SGX's attestation
+capabilities.
+
+
+EPC Implemenations
+==================
+
+PRM with MEE
+--------------
+
+Initial hardware support for SGX implements the EPC by reserving a chunk of
+system DRAM, referred to as Processor Reserved Memory (PRM). A percentage of
+PRM is consumed by the processor to implement the EPCM, with the remainder of
+PRM being exposed to software as the EPC. PRM is configured by firmware via
+dedicated PRM Range Registers (PRMRRs). The PRMRRs are locked which are locked as part of SGX activation, i.e.
+resizing the PRM, and thus EPC, requires rebooting the system.
+
+An autonomous hardware unit called the Memory Encryption Engine (MEE) protects
+the confidentiality, integrity, and freshness of the PRM, e.g. {de,en}crypts
+data as it is read/written from/to DRAM to provide confidentiality.
+
+
+SGX activation
+==============
+
+Before SGX can be fully enabled, e.g. via FEATURE_CONTROL, the platform must
+undergo explicit SGX activation. SGX activation is a mechanism by which the
+processor verifies and locks the platform configuration set by pre-boot
+firmware, e.g. to ensure it satisfies SGX's security requirements. Before
+SGX is activated (and its configuration locked), firmware can modify the
+PRMRRs, e.g. to set the base/size of the PRM and thus EPC, and can also write
+the SGX_LEPUBKEYHASH MSRs. Notably, the latter allows pre-boot firmware to
+lock the SGX_LEPUBKEYHASH MSRs to a non-Intel value by writing the MSRs and
+locking MSR_IA32_FEATURE_CONTROL without setting the "SGX LE pubkey hash
+writable" flag, i.e. making the SGX_LEPUBKEYHASH MSRs readonly.
+
+
+Footnotes
+=========
+
+.. [1] All processors that do not support the SGX2 ISA take an errata and
+ signal #GP(0) instead of #PF(PF_SGX) when vectoring EPCM violations and
+ faults due to enclave-accesses to non-EPC memory.
+
+.. [2] Note that despite being vectored as a #PF, a #PF with PF_SGX has nothing
+ to do with conventional paging.
+
diff --git a/Documentation/x86/sgx/index.rst b/Documentation/x86/sgx/index.rst
new file mode 100644
index 000000000000..c5dfef62e612
--- /dev/null
+++ b/Documentation/x86/sgx/index.rst
@@ -0,0 +1,16 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=========================
+Software Guard Extensions
+=========================
+
+Intel(R) SGX is a set of architectural extensions that enables applications to
+establish secure containers, a.k.a. enclaves. SGX enclaves provide security
+guarantees such as integrity and confidentiality, even when running on a system
+where privileged software, e.g. kernel, hypervisor, etc... is untrusted and
+potentially malicious.
+
+.. toctree::
+ :maxdepth: 1
+
+ 1.Architecture
--
2.20.1