Re: x86/sgx: uapi change proposal

From: Sean Christopherson
Date: Wed Jan 02 2019 - 15:47:56 EST


On Sat, Dec 22, 2018 at 10:25:02AM +0200, Jarkko Sakkinen wrote:
> On Sat, Dec 22, 2018 at 10:16:49AM +0200, Jarkko Sakkinen wrote:
> > On Thu, Dec 20, 2018 at 12:32:04PM +0200, Jarkko Sakkinen wrote:
> > > On Wed, Dec 19, 2018 at 06:58:48PM -0800, Andy Lutomirski wrote:
> > > > Can one of you explain why SGX_ENCLAVE_CREATE is better than just
> > > > opening a new instance of /dev/sgx for each encalve?
> > >
> > > I think that fits better to the SCM_RIGHTS scenario i.e. you could send
> > > the enclav to a process that does not have necessarily have rights to
> > > /dev/sgx. Gives more robust environment to configure SGX.
> >
> > Sean, is this why you wanted enclave fd and anon inode and not just use
> > the address space of /dev/sgx? Just taking notes of all observations.
> > I'm not sure what your rationale was (maybe it was somewhere). This was
> > something I made up, and this one is wrong deduction. You can easily
> > get the same benefit with /dev/sgx associated fd representing the
> > enclave.
> >
> > This all means that for v19 I'm going without enclave fd involved with
> > fd to /dev/sgx representing the enclave. No anon inodes will be
> > involved.
>
> Based on these observations I updated the uapi.
>
> As far as I'm concerned there has to be a solution to do EPC mapping
> with a sequence:
>
> 1. Ping /dev/kvm to do something.
> 2. KVM asks SGX core to do something.
> 3. SGX core does something.
>
> I don't care what the something is exactly is, but KVM is the only sane
> place for KVM uapi. I would be surprised if KVM maintainers didn't agree
> that they don't want to sprinkle KVM uapi to random places in other
> subsystems.

It's not a KVM uapi.

KVM isn't a hypervisor in the traditional sense. The "real" hypervisor
lives in userspace, e.g. Qemu, KVM is essentially just a (very fancy)
driver for hardware accelerators, e.g. VMX. Qemu for example is fully
capable of running an x86 VM without KVM, it's just substantially slower.

In terms of guest memory, KVM doesn't care or even know what a particular
region of memory represents or what, if anything, is backing a region in
the host. There are cases when KVM is made aware of certain aspects of
guest memory for performance or functional reasons, e.g. emulated MMIO
and encrypted memory, but in all cases the control logic ultimately
resides in userspace.

SGX is a weird case because ENCLS can't be emulated in software, i.e.
exposing SGX to a VM without KVM's help would be difficult. But, it
wouldn't be impossible, just slow and ugly.

And so, ignoring host oversubscription for the moment, there is no hard
requirement that SGX EPC can only be exposed to a VM through KVM. In
other words, allocating and exposing EPC to a VM is orthogonal to KVM
supporting SGX. Exposing EPC to userspace via /dev/sgx/epc would mean
that KVM would handle it like any other guest memory region, and all EPC
related code/logic would reside in the SGX subsystem.

Oversubscription throws a wrench in the system because ENCLV can only
be executed post-VMXON and EPC conflicts generate VMX VM-Exits. But
even then, KVM doesn't need to own the EPC uapi, e.g. it can call into
the SGX subsystem to handle EPC conflict VM-Exits and the SGX subsystem
can wrap ENCLV with exception fixup and forcefully reclaim EPC pages if
ENCLV faults.

I can't be 100% certain the oversubscription scheme will be sane without
actually writing the code, but I'd like to at least keep the option open,
i.e. not structure /dev/sgx/ in such a way that adding e.g. /dev/sgx/epc
is impossible or ugly.