Re: [PATCH v39 11/24] x86/sgx: Add SGX enclave driver

From: Jarkko Sakkinen
Date: Sun Oct 04 2020 - 19:42:14 EST


On Sun, Oct 04, 2020 at 11:27:50PM +0100, Matthew Wilcox wrote:
> On Mon, Oct 05, 2020 at 12:50:49AM +0300, Jarkko Sakkinen wrote:
> > What is shoukd take is encl->lock.
> >
> > The loop was pre-v36 like:
> >
> > idx_start = PFN_DOWN(start);
> > idx_end = PFN_DOWN(end - 1);
> >
> > for (idx = idx_start; idx <= idx_end; ++idx) {
> > mutex_lock(&encl->lock);
> > page = radix_tree_lookup(&encl->page_tree, idx);
> > mutex_unlock(&encl->lock);
> >
> > if (!page || (~page->vm_max_prot_bits & vm_prot_bits))
> > return -EACCES;
> > }
> >
> > Looking at xarray.h and filemap.c, I'm thinking something along the
> > lines of:
> >
> > for (idx = idx_start; idx <= idx_end; ++idx) {
> > mutex_lock(&encl->lock);
> > page = xas_find(&xas, idx + 1);
> > mutex_unlock(&encl->lock);
> >
> > if (!page || (~page->vm_max_prot_bits & vm_prot_bits))
> > return -EACCES;
> > }
> >
> > Does this look about right?
>
> Not really ...
>
> int ret = 0;
>
> mutex_lock(&encl->lock);
> rcu_read_lock();

Right, so xa_*() take RCU lock implicitly and xas_* do not.

> while (xas.index < idx_end) {
> page = xas_next(&xas);

It should iterate through every possible page index within the range,
even the ones that do not have an entry, i.e. this loop also checks
that there are no empty slots.

Does xas_next() go through every possible index, or skip the non-empty
ones?

> if (!page || (~page->vm_max_prot_bits & vm_prot_bits))
> ret = -EACCESS;
> break;
> }
> }
> rcu_read_unlock();
> mutex_unlock(&encl->lock);

In my Geminilake NUC the maximum size of the address space is 64GB for
an enclave, and it is not fixed but can grow in microarchitectures
beyond that.

That means that in (*artificial*) worst case the locks would be kept for
64*1024*1024*1024/4096 = 16777216 iterations.

I just realized that in sgx_encl_load_page ([1], the encl->lock is
acquired by the caller) I have used xa_load(), which more or less would
be compatible with the old radix_tree pattern, i.e.

for (idx = idx_start; idx <= idx_end; ++idx) {
mutex_lock(&encl->lock);
page = xas_load(&encl->page_array, idx);
mutex_unlock(&encl->lock);

if (!page || (~page->vm_max_prot_bits & vm_prot_bits))
return -EACCES;
}

To make things stable again, I'll go with this for the immediate future.

> return ret;
>
> ... or you could rework to use the xa_lock instead of encl->lock.
> I don't know how feasible that is for you.

encl->lock is used to protect enclave state but it is true that
page->vm_max_prort_bits is not modified through concurrent access, once
the page is added (e.g. by the reclaimer, which gets pages through
sgx_activate_page_list, not through xarray).

It's an interesting idea, but before even considering it I want to fix
the bug, even if the fix ought to be somehow unoptimal in terms of
performance.

Thanks for helping with this. xarray is still somewhat alien to me and
most of the code I see just use the iterator macros excep mm/*, but
I'm slowly adapting the concepts.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-sgx.git/tree/arch/x86/kernel/cpu/sgx/encl.c
[2] https://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-sgx.git/tree/arch/x86/kernel/cpu/sgx/main.c

/Jarkko