Re: [PATCH v29 00/20] Intel SGX foundations

From: Sean Christopherson
Date: Thu May 07 2020 - 15:35:03 EST


On Thu, May 07, 2020 at 12:49:15PM -0400, Nathaniel McCallum wrote:
> On Thu, May 7, 2020 at 1:03 AM Haitao Huang
> <haitao.huang@xxxxxxxxxxxxxxx> wrote:
> >
> > On Wed, 06 May 2020 17:14:22 -0500, Sean Christopherson
> > <sean.j.christopherson@xxxxxxxxx> wrote:
> >
> > > On Wed, May 06, 2020 at 05:42:42PM -0400, Nathaniel McCallum wrote:
> > >> Tested on Enarx. This requires a patch[0] for v29 support.
> > >>
> > >> Tested-by: Nathaniel McCallum <npmccallum@xxxxxxxxxx>
> > >>
> > >> However, we did uncover a small usability issue. See below.
> > >>
> > >> [0]:
> > >> https://github.com/enarx/enarx/pull/507/commits/80da2352aba46aa7bc6b4d1fccf20fe1bda58662
> > >
> > > ...
> > >
> > >> > * Disallow mmap(PROT_NONE) from /dev/sgx. Any mapping (e.g.
> > >> anonymous) can
> > >> > be used to reserve the address range. Now /dev/sgx supports only
> > >> opaque
> > >> > mappings to the (initialized) enclave data.
> > >>
> > >> The statement "Any mapping..." isn't actually true.

Yeah, this definitely misleading. I haven't looked at our most recent docs,
but I'm going to go out on a limb and assume we haven't documented the
preferred mechanism for carving out virtual memory for the enclave. That
absolutely should be done.

> > >> Enarx creates a large enclave (currently 64GiB). This worked when we
> > >> created a file-backed mapping on /dev/sgx/enclave. However, switching
> > >> to an anonymous mapping fails with ENOMEM. We suspect this is because
> > >> the kernel attempts to allocate all the pages and zero them but there
> > >> is insufficient RAM available. We currently work around this by
> > >> creating a shared mapping on /dev/zero.
> > >
> > > Hmm, the kernel shouldn't actually allocate physical pages unless they're
> > > written. I'll see if I can reproduce.
> > >
> >
> > For larger size mmap, I think it requires enabling vm overcommit mode 1:
> > echo 1 | sudo tee /proc/sys/vm/overcommit_memory

It shouldn't unless the initial mmap is "broken". Not truly broken, but
broken in the sense that what Enarx is asking for is not actually what it
desires.

> Which means the default experience isn't good.

What PROT_* and MAP_* flags are passed to mmap()? Overcommit only applies
to

VM_WRITE (a.k.a. PROT_WRITE) && !VM_SHARED && !VM_NORESERVED

and, ignoring rlimits, VM expansion only applies to

VM_WRITE && !VM_SHARED && !VM_STACK


So hopefully Enarx is doing something like

base = mmap(NULL, 64gb, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);

because that means this is effectively a userspace bug. This goes back to
my comment about the mmap() being "broken". Userspace is asking for a
writable, private mapping, in which case it absolutely should be accounted.

If using

base = mmap(NULL, 64gb, PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);

works, then updating the SGX docs to better explain how to establish ELRANGE
is sufficient (we need to that in any case). If the above still fails then
something else is in play.