Re: [PATCH v17 18/23] platform/x86: Intel SGX driver

From: Sean Christopherson
Date: Tue Dec 18 2018 - 10:44:22 EST

On Mon, Dec 17, 2018 at 08:59:54PM -0800, Andy Lutomirski wrote:
> On Mon, Dec 17, 2018 at 2:20 PM Sean Christopherson
> <sean.j.christopherson@xxxxxxxxx> wrote:
> >
> > My brain is still sorting out the details, but I generally like the idea
> > of allocating an anon inode when creating an enclave, and exposing the
> > other ioctls() via the returned fd. This is essentially the approach
> > used by KVM to manage multiple "layers" of ioctls across KVM itself, VMs
> > and vCPUS. There are even similarities to accessing physical memory via
> > multiple disparate domains, e.g. host kernel, host userspace and guest.
> >
> In my mind, opening /dev/sgx would give you the requisite inode. I'm
> not 100% sure that the chardev infrastructure allows this, but I think
> it does.

My fd/inode knowledge is lacking, to say the least. Whatever works, so
long as we have a way to uniquely identify enclaves.

> > The only potential hiccup I can see is the build flow. Currently,
> > EADD+EEXTEND is done via a work queue to avoid major performance issues
> > (10x regression) when userspace is building multiple enclaves in parallel
> > using goroutines to wrap Cgo (the issue might apply to any M:N scheduler,
> > but I've only confirmed the Golang case). The issue is that allocating
> > an EPC page acts like a blocking syscall when the EPC is under pressure,
> > i.e. an EPC page isn't immediately available. This causes Go's scheduler
> > to thrash and tank performance[1].
> What's the issue, and how does a workqueue help? I'm wondering if a
> nicer solution would be an ioctl to add lots of pages in a single
> call.

Adding pages via workqueue makes the ioctl itself fast enough to avoid
triggering Go's rescheduling. A batched EADD flow would likely help,
I just haven't had the time to rework the userspace side to be able to
test the performance.

> >
> > Alternatively, we could change the EADD+EEXTEND flow to not insert the
> > added page's PFN into the owner's process space, i.e. force userspace to
> > fault when it runs the enclave. But that only delays the issue because
> > eventually we'll want to account EPC pages, i.e. add a cgroup, at which
> > point we'll likely need current->mm anyways.
> You should be able to account the backing pages to a cgroup without
> actually sticking them into the EPC, no? Or am I misunderstanding? I
> guess we'll eventually want a cgroup to limit use of the limited EPC
> resources.

It's the latter, a cgroup to limit EPC. The mm is used to retrieve the
cgroup without having track e.g. the task_struct.