Re: x86/sgx: uapi change proposal

From: Jarkko Sakkinen
Date: Thu Jan 03 2019 - 10:03:06 EST


On Wed, Jan 02, 2019 at 12:47:52PM -0800, Sean Christopherson wrote:
> On Sat, Dec 22, 2018 at 10:25:02AM +0200, Jarkko Sakkinen wrote:
> > On Sat, Dec 22, 2018 at 10:16:49AM +0200, Jarkko Sakkinen wrote:
> > > On Thu, Dec 20, 2018 at 12:32:04PM +0200, Jarkko Sakkinen wrote:
> > > > On Wed, Dec 19, 2018 at 06:58:48PM -0800, Andy Lutomirski wrote:
> > > > > Can one of you explain why SGX_ENCLAVE_CREATE is better than just
> > > > > opening a new instance of /dev/sgx for each encalve?
> > > >
> > > > I think that fits better to the SCM_RIGHTS scenario i.e. you could send
> > > > the enclav to a process that does not have necessarily have rights to
> > > > /dev/sgx. Gives more robust environment to configure SGX.
> > >
> > > Sean, is this why you wanted enclave fd and anon inode and not just use
> > > the address space of /dev/sgx? Just taking notes of all observations.
> > > I'm not sure what your rationale was (maybe it was somewhere). This was
> > > something I made up, and this one is wrong deduction. You can easily
> > > get the same benefit with /dev/sgx associated fd representing the
> > > enclave.
> > >
> > > This all means that for v19 I'm going without enclave fd involved with
> > > fd to /dev/sgx representing the enclave. No anon inodes will be
> > > involved.
> >
> > Based on these observations I updated the uapi.
> >
> > As far as I'm concerned there has to be a solution to do EPC mapping
> > with a sequence:
> >
> > 1. Ping /dev/kvm to do something.
> > 2. KVM asks SGX core to do something.
> > 3. SGX core does something.
> >
> > I don't care what the something is exactly is, but KVM is the only sane
> > place for KVM uapi. I would be surprised if KVM maintainers didn't agree
> > that they don't want to sprinkle KVM uapi to random places in other
> > subsystems.
>
> It's not a KVM uapi.
>
> KVM isn't a hypervisor in the traditional sense. The "real" hypervisor
> lives in userspace, e.g. Qemu, KVM is essentially just a (very fancy)
> driver for hardware accelerators, e.g. VMX. Qemu for example is fully
> capable of running an x86 VM without KVM, it's just substantially slower.
>
> In terms of guest memory, KVM doesn't care or even know what a particular
> region of memory represents or what, if anything, is backing a region in
> the host. There are cases when KVM is made aware of certain aspects of
> guest memory for performance or functional reasons, e.g. emulated MMIO
> and encrypted memory, but in all cases the control logic ultimately
> resides in userspace.
>
> SGX is a weird case because ENCLS can't be emulated in software, i.e.
> exposing SGX to a VM without KVM's help would be difficult. But, it
> wouldn't be impossible, just slow and ugly.
>
> And so, ignoring host oversubscription for the moment, there is no hard
> requirement that SGX EPC can only be exposed to a VM through KVM. In
> other words, allocating and exposing EPC to a VM is orthogonal to KVM
> supporting SGX. Exposing EPC to userspace via /dev/sgx/epc would mean
> that KVM would handle it like any other guest memory region, and all EPC
> related code/logic would reside in the SGX subsystem.

I'm fine doing that if it makes sense. I just don't understand why you
cannot add ioctls to /dev/kvm for allocating the region. Why isn't that
possible? As I said to Andy earlier, adding new device files is easy as
everything related to device creation is nicely encapsulated.

> Oversubscription throws a wrench in the system because ENCLV can only
> be executed post-VMXON and EPC conflicts generate VMX VM-Exits. But
> even then, KVM doesn't need to own the EPC uapi, e.g. it can call into
> the SGX subsystem to handle EPC conflict VM-Exits and the SGX subsystem
> can wrap ENCLV with exception fixup and forcefully reclaim EPC pages if
> ENCLV faults.

If the uapi is *only* for KVM, it should definitely own it. KVM calling
SGX subsystem on a conflict is KVM using in-kernel APIs provided by the
SGX core.

> I can't be 100% certain the oversubscription scheme will be sane without
> actually writing the code, but I'd like to at least keep the option open,
> i.e. not structure /dev/sgx/ in such a way that adding e.g. /dev/sgx/epc
> is impossible or ugly.

/Jarkko