Because the kernel is untrusted, swapping pages in/out of the Enclave
Page Cache (EPC) has specialized requirements:
* The kernel cannot directly access EPC memory, i.e. cannot copy data
to/from the EPC.
* To evict a page from the EPC, the kernel must "prove" to hardware that
are no valid TLB entries for said page since a stale TLB entry would
allow an attacker to bypass SGX access controls.
* When loading a page back into the EPC, hardware must be able to verify
the integrity and freshness of the data.
* When loading an enclave page, e.g. regular pages and Thread Control
Structures (TCS), hardware must be able to associate the page with a
Secure Enclave Control Structure (SECS).
To satisfy the above requirements, the CPU provides dedicated ENCLS
functions to support paging data in/out of the EPC:
* EBLOCK: Mark a page as blocked in the EPC Map (EPCM). Attempting
to access a blocked page that misses the TLB will fault.
* ETRACK: Activate blocking tracking. Hardware verifies that all
translations for pages marked as "blocked" have been flushed
from the TLB.
* EPA: Add version array page to the EPC. As the name suggests, a
VA page is an 512-entry array of version numbers that are
used to uniquely identify pages evicted from the EPC.
* EWB: Write back a page from EPC to memory, e.g. RAM. Software
must supply a VA slot, memory to hold the a Paging Crypto
Metadata (PCMD) of the page and obviously backing for the
evicted page.
* ELD{B,U}: Load a page in {un}blocked state from memory to EPC. The
driver only uses the ELDU variant as there is no use case
for loading a page as "blocked" in a bare metal environment.
To top things off, all of the above ENCLS functions are subject to
strict concurrency rules, e.g. many operations will #GP fault if two
or more operations attempt to access common pages/structures.
To put it succinctly, paging in/out of the EPC requires coordinating
with the SGX driver where all of an enclave's tracking resides. But,
simply shoving all reclaim logic into the driver is not desirable as
doing so has unwanted long term implications:
* Oversubscribing EPC to KVM guests, i.e. virtualizing SGX in KVM and
swapping a guest's EPC pages (without the guest's cooperation) needs
the same high level flows for reclaim but has painfully different
semantics in the details.
* Accounting EPC, i.e. adding an EPC cgroup controller, is desirable
as EPC is effectively a specialized memory type and even more scarce
than system memory. Providing a single touchpoint for EPC accounting
regardless of end consumer greatly simplifies the EPC controller.
* Allowing the userspace-facing driver to be built as a loaded module
is desirable, e.g. for debug, testing and development. The cgroup
infrastructure does not support dependencies on loadable modules.
* Separating EPC swapping from the driver once it has been tightly
coupled to the driver is non-trivial (speaking from experience).
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature