[PATCH v21 27/28] docs: x86/sgx: Document kernel internals
From: Jarkko Sakkinen
Date: Sat Jul 13 2019 - 13:13:29 EST
From: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>
Document some of the more tricky parts of the kernel implementation
internals.
Signed-off-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>
Co-developed-by: Jarkko Sakkinen <jarkko.sakkinen@xxxxxxxxxxxxxxx>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@xxxxxxxxxxxxxxx>
---
Documentation/x86/sgx/2.Kernel-internals.rst | 76 ++++++++++++++++++++
Documentation/x86/sgx/index.rst | 1 +
2 files changed, 77 insertions(+)
create mode 100644 Documentation/x86/sgx/2.Kernel-internals.rst
diff --git a/Documentation/x86/sgx/2.Kernel-internals.rst b/Documentation/x86/sgx/2.Kernel-internals.rst
new file mode 100644
index 000000000000..5c90a65936f2
--- /dev/null
+++ b/Documentation/x86/sgx/2.Kernel-internals.rst
@@ -0,0 +1,76 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+================
+Kernel Internals
+================
+
+CPU configuration
+=================
+
+Because SGX has an ever evolving and expanding feature set, it's possible for
+a BIOS or VMM to configure a system in such a way that not all CPUs are equal,
+e.g. where Launch Control is only enabled on a subset of CPUs. Linux does
+*not* support such a heterogeneous system configuration, nor does it even
+attempt to play nice in the face of a misconfigured system. With the exception
+of Launch Control's hash MSRs, which can vary per CPU, Linux assumes that all
+CPUs have a configuration that is identical to the boot CPU.
+
+EPC management
+==============
+
+Because the kernel can't arbitrarily read EPC memory or share RO backing pages
+between enclaves, traditional memory models such as CoW and fork() do not work
+with enclaves. In other words, the architectural rules of EPC forces it to be
+treated as MAP_SHARED at all times.
+
+The inability to employ traditional memory models also means that EPC memory
+must be isolated from normal memory pools, e.g. attempting to use EPC memory
+for normal mappings would result in faults and/or perceived data corruption.
+Furthermore, EPC is not enumerated by as normal memory, e.g. BIOS enumerates
+EPC as reserved memory in the e820 tables, or not at all. As a result, EPC
+memory is directly managed by the SGX subsystem, e.g. SGX employs VM_PFNMAP to
+manually insert/zap/swap page table entries, and exposes EPC to userspace via
+a well known device, /dev/sgx/enclave.
+
+The net effect is that all enclave VMAs must be MAP_SHARED and are backed by
+a single file, /dev/sgx/enclave.
+
+EPC oversubscription
+====================
+
+SGX allows to have larger enclaves than amount of available EPC by providing a
+subset of leaf instruction for swapping EPC pages to the system memory. The
+details of these instructions are discussed in the architecture document. Due
+to the unique requirements for swapping EPC pages, and because EPC pages do not
+have associated page structures, management of the EPC is not handled by the
+standard memory subsystem.
+
+SGX directly handles swapping of EPC pages, including a thread to initiate the
+reclaiming process and a rudimentary LRU mechanism. When the amount of free EPC
+pages goes below a low watermark the swapping thread starts reclaiming pages.
+The pages that have not been recently accessed (i.e. do not have the A bit set)
+are selected as victim pages. Each enclave holds an shmem file as a backing
+storage for reclaimed pages.
+
+Launch Control
+==============
+
+The current kernel implementation supports only writable MSRs. The launch is
+performed by setting the MSRs to the hash of the public key modulus of the
+enclave signer and a token with the valid bit set to zero. Because kernel makes
+ultimately all the launch decisions token are not needed for anything. We
+don't need or have a launch enclave for generating them as the MSRs must always
+be writable.
+
+Provisioning
+============
+
+The use of provisioning must be controlled because it allows to get access to
+the provisioning keys to attest to a remote party that the software is running
+inside a legit enclave. This could be used by a malware network to ensure that
+its nodes are running inside legit enclaves.
+
+The driver introduces a special device file /dev/sgx/provision and a special
+ioctl SGX_IOC_ENCLAVE_SET_ATTRIBUTE to accomplish this. A file descriptor
+pointing to /dev/sgx/provision is passed to ioctl from which kernel authorizes
+the PROVISION_KEY attribute to the enclave.
diff --git a/Documentation/x86/sgx/index.rst b/Documentation/x86/sgx/index.rst
index c5dfef62e612..5d660e83d984 100644
--- a/Documentation/x86/sgx/index.rst
+++ b/Documentation/x86/sgx/index.rst
@@ -14,3 +14,4 @@ potentially malicious.
:maxdepth: 1
1.Architecture
+ 2.Kernel-internals
--
2.20.1