Re: [PATCH v39 11/24] x86/sgx: Add SGX enclave driver
From: Jarkko Sakkinen
Date: Sun Oct 04 2020 - 10:33:00 EST
On Sat, Oct 03, 2020 at 04:39:25PM +0200, Greg KH wrote:
> On Sat, Oct 03, 2020 at 07:50:46AM +0300, Jarkko Sakkinen wrote:
> > Intel Software Guard eXtensions (SGX) is a set of CPU instructions that can
> > be used by applications to set aside private regions of code and data. The
> > code outside the SGX hosted software entity is prevented from accessing the
> > memory inside the enclave by the CPU. We call these entities enclaves.
> >
> > Add a driver that provides an ioctl API to construct and run enclaves.
> > Enclaves are constructed from pages residing in reserved physical memory
> > areas. The contents of these pages can only be accessed when they are
> > mapped as part of an enclave, by a hardware thread running inside the
> > enclave.
> >
> > The starting state of an enclave consists of a fixed measured set of
> > pages that are copied to the EPC during the construction process by
> > using the opcode ENCLS leaf functions and Software Enclave Control
> > Structure (SECS) that defines the enclave properties.
> >
> > Enclaves are constructed by using ENCLS leaf functions ECREATE, EADD and
> > EINIT. ECREATE initializes SECS, EADD copies pages from system memory to
> > the EPC and EINIT checks a given signed measurement and moves the enclave
> > into a state ready for execution.
> >
> > An initialized enclave can only be accessed through special Thread Control
> > Structure (TCS) pages by using ENCLU (ring-3 only) leaf EENTER. This leaf
> > function converts a thread into enclave mode and continues the execution in
> > the offset defined by the TCS provided to EENTER. An enclave is exited
> > through syscall, exception, interrupts or by explicitly calling another
> > ENCLU leaf EEXIT.
> >
> > The mmap() permissions are capped by the contained enclave page
> > permissions. The mapped areas must also be populated, i.e. each page
> > address must contain a page. This logic is implemented in
> > sgx_encl_may_map().
> >
> > Cc: linux-security-module@xxxxxxxxxxxxxxx
> > Cc: linux-mm@xxxxxxxxx
> > Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> > Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx>
> > Acked-by: Jethro Beekman <jethro@xxxxxxxxxxxx>
> > Tested-by: Jethro Beekman <jethro@xxxxxxxxxxxx>
> > Tested-by: Haitao Huang <haitao.huang@xxxxxxxxxxxxxxx>
> > Tested-by: Chunyang Hui <sanqian.hcy@xxxxxxxxxx>
> > Tested-by: Jordan Hand <jorhand@xxxxxxxxxxxxxxxxxxx>
> > Tested-by: Nathaniel McCallum <npmccallum@xxxxxxxxxx>
> > Tested-by: Seth Moore <sethmo@xxxxxxxxxx>
> > Tested-by: Darren Kenny <darren.kenny@xxxxxxxxxx>
> > Reviewed-by: Darren Kenny <darren.kenny@xxxxxxxxxx>
> > Co-developed-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>
> > Signed-off-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>
> > Co-developed-by: Suresh Siddha <suresh.b.siddha@xxxxxxxxx>
> > Signed-off-by: Suresh Siddha <suresh.b.siddha@xxxxxxxxx>
> > Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@xxxxxxxxxxxxxxx>
> > ---
> > arch/x86/kernel/cpu/sgx/Makefile | 2 +
> > arch/x86/kernel/cpu/sgx/driver.c | 173 ++++++++++++++++
> > arch/x86/kernel/cpu/sgx/driver.h | 29 +++
> > arch/x86/kernel/cpu/sgx/encl.c | 331 +++++++++++++++++++++++++++++++
> > arch/x86/kernel/cpu/sgx/encl.h | 85 ++++++++
> > arch/x86/kernel/cpu/sgx/main.c | 11 +
> > 6 files changed, 631 insertions(+)
> > create mode 100644 arch/x86/kernel/cpu/sgx/driver.c
> > create mode 100644 arch/x86/kernel/cpu/sgx/driver.h
> > create mode 100644 arch/x86/kernel/cpu/sgx/encl.c
> > create mode 100644 arch/x86/kernel/cpu/sgx/encl.h
> >
> > diff --git a/arch/x86/kernel/cpu/sgx/Makefile b/arch/x86/kernel/cpu/sgx/Makefile
> > index 79510ce01b3b..3fc451120735 100644
> > --- a/arch/x86/kernel/cpu/sgx/Makefile
> > +++ b/arch/x86/kernel/cpu/sgx/Makefile
> > @@ -1,2 +1,4 @@
> > obj-y += \
> > + driver.o \
> > + encl.o \
> > main.o
> > diff --git a/arch/x86/kernel/cpu/sgx/driver.c b/arch/x86/kernel/cpu/sgx/driver.c
> > new file mode 100644
> > index 000000000000..f54da5f19c2b
> > --- /dev/null
> > +++ b/arch/x86/kernel/cpu/sgx/driver.c
> > @@ -0,0 +1,173 @@
> > +// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause)
>
> You use gpl-only header files in this file, so how in the world can it
> be bsd-3 licensed?
>
> Please get your legal department to agree with this, after you explain
> to them how you are mixing gpl2-only code in with this file.
I'll do what I already stated that I will do. Should I do something
more?
> > +// Copyright(c) 2016-18 Intel Corporation.
>
> Dates are hard to get right :(
Will fix.
>
> > +
> > +#include <linux/acpi.h>
> > +#include <linux/miscdevice.h>
> > +#include <linux/mman.h>
> > +#include <linux/security.h>
> > +#include <linux/suspend.h>
> > +#include <asm/traps.h>
> > +#include "driver.h"
> > +#include "encl.h"
> > +
> > +u64 sgx_encl_size_max_32;
> > +u64 sgx_encl_size_max_64;
> > +u32 sgx_misc_reserved_mask;
> > +u64 sgx_attributes_reserved_mask;
> > +u64 sgx_xfrm_reserved_mask = ~0x3;
> > +u32 sgx_xsave_size_tbl[64];
> > +
> > +static int sgx_open(struct inode *inode, struct file *file)
> > +{
> > + struct sgx_encl *encl;
> > + int ret;
> > +
> > + encl = kzalloc(sizeof(*encl), GFP_KERNEL);
> > + if (!encl)
> > + return -ENOMEM;
> > +
> > + atomic_set(&encl->flags, 0);
> > + kref_init(&encl->refcount);
> > + xa_init(&encl->page_array);
> > + mutex_init(&encl->lock);
> > + INIT_LIST_HEAD(&encl->mm_list);
> > + spin_lock_init(&encl->mm_lock);
> > +
> > + ret = init_srcu_struct(&encl->srcu);
> > + if (ret) {
> > + kfree(encl);
> > + return ret;
> > + }
> > +
> > + file->private_data = encl;
> > +
> > + return 0;
> > +}
> > +
> > +static int sgx_release(struct inode *inode, struct file *file)
> > +{
> > + struct sgx_encl *encl = file->private_data;
> > + struct sgx_encl_mm *encl_mm;
> > +
> > + for ( ; ; ) {
> > + spin_lock(&encl->mm_lock);
> > +
> > + if (list_empty(&encl->mm_list)) {
> > + encl_mm = NULL;
> > + } else {
> > + encl_mm = list_first_entry(&encl->mm_list,
> > + struct sgx_encl_mm, list);
> > + list_del_rcu(&encl_mm->list);
> > + }
> > +
> > + spin_unlock(&encl->mm_lock);
> > +
> > + /* The list is empty, ready to go. */
> > + if (!encl_mm)
> > + break;
> > +
> > + synchronize_srcu(&encl->srcu);
> > + mmu_notifier_unregister(&encl_mm->mmu_notifier, encl_mm->mm);
> > + kfree(encl_mm);
> > + }
> > +
> > + mutex_lock(&encl->lock);
> > + atomic_or(SGX_ENCL_DEAD, &encl->flags);
>
> So you set a flag that this is dead, and then instantly delete it? Why
> does that matter? I see you check for this flag elsewhere, but as you
> are just about to delete this structure, how can this be an issue?
It matters because ksgxswapd (sgx_reclaimer_*) might be processing it.
It will use the flag to skip the operations that it would do to a victim
page, when the enclave is still alive.
>
> > + mutex_unlock(&encl->lock);
> > +
> > + kref_put(&encl->refcount, sgx_encl_release);
>
> Don't you need to hold the lock across the put? If not, what is
> serializing this?
>
> But an even larger comment, why is this reference count needed at all?
>
> You never grab it except at init time, and you free it at close time.
> Why not rely on the reference counting that the vfs ensures you?
Because ksgxswapd needs the alive enclave instance while it is in the
process of swapping a victim page. The reason for this is the
hierarchical nature of the enclave pages.
As an example, a write operation to main memory, EWB (SDM vol 3D 40-79)
needs to access SGX Enclave Control Structure (SECS) page, which is
contains global data for an enclave, like the unswapped child count.
> > + return 0;
> > +}
> > +
> > +static int sgx_mmap(struct file *file, struct vm_area_struct *vma)
> > +{
> > + struct sgx_encl *encl = file->private_data;
> > + int ret;
> > +
> > + ret = sgx_encl_may_map(encl, vma->vm_start, vma->vm_end, vma->vm_flags);
> > + if (ret)
> > + return ret;
> > +
> > + ret = sgx_encl_mm_add(encl, vma->vm_mm);
> > + if (ret)
> > + return ret;
> > +
> > + vma->vm_ops = &sgx_vm_ops;
> > + vma->vm_flags |= VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP | VM_IO;
> > + vma->vm_private_data = encl;
> > +
> > + return 0;
> > +}
> > +
> > +static unsigned long sgx_get_unmapped_area(struct file *file,
> > + unsigned long addr,
> > + unsigned long len,
> > + unsigned long pgoff,
> > + unsigned long flags)
> > +{
> > + if ((flags & MAP_TYPE) == MAP_PRIVATE)
> > + return -EINVAL;
> > +
> > + if (flags & MAP_FIXED)
> > + return addr;
> > +
> > + return current->mm->get_unmapped_area(file, addr, len, pgoff, flags);
> > +}
> > +
> > +static const struct file_operations sgx_encl_fops = {
> > + .owner = THIS_MODULE,
> > + .open = sgx_open,
> > + .release = sgx_release,
> > + .mmap = sgx_mmap,
> > + .get_unmapped_area = sgx_get_unmapped_area,
> > +};
> > +
> > +static struct miscdevice sgx_dev_enclave = {
> > + .minor = MISC_DYNAMIC_MINOR,
> > + .name = "enclave",
> > + .nodename = "sgx/enclave",
>
> A subdir for a single device node? Ok, odd, but why not just
> "sgx_enclave"? How "special" is this device node?
There is a patch that adds "sgx/provision".
Either works for me. Should I flatten them to "sgx_enclave" and
"sgx_provision", or keep them as they are?
> thanks,
>
> greg k-h
/Jarkko