Re: [PATCH v17 18/23] platform/x86: Intel SGX driver

From: Sean Christopherson
Date: Mon Dec 17 2018 - 13:10:00 EST


On Mon, Dec 17, 2018 at 07:49:35PM +0200, Jarkko Sakkinen wrote:
> On Mon, Dec 17, 2018 at 09:31:06AM -0800, Sean Christopherson wrote:
> > This doesn't work as-is. sgx_encl_release() needs to use sgx_free_page()
> > and not __sgx_free_page() so that we get a WARN() if the page can't be
> > freed. sgx_invalidate() needs to use __sgx_free_page() as freeing a page
> > can fail due to running concurrently with reclaim. I'll play around with
> > the code a bit, there's probably a fairly clean way to share code between
> > the two flows.
>
> Hmm... but why issue a warning in that case? It should be legit
> behaviour.

No, EREMOVE should never fail if the enclave is being released, i.e. all
references to the enclave are gone. And failure during sgx_encl_release()
means we leaked an EPC page, which warrants a WARN.

The only legitimate reason __sgx_free_page() can fail in sgx_invalidate()
is because a page might be in the process of being reclaimed. We could
theoretically WARN on EREMOVE failure in that case, but it'd make the code
a little fragile and it's not "fatal" in the sense that we get a second
chance to free the page during sgx_encl_release().

And unless I missed something, using sgx_invalidate() means were' leaking
all sgx_encl_page structs as well as the radix tree entries.

> > sgx_encl_release_worker() calls do_unmap() without checking the validity
> > of the page tables[1]. As is, the code doesn't even guarantee mm_struct
> > itself is valid.
> >
> > The easiest fix I can think of is to add a SGX_ENCL_MM_RELEASED flag
> > that is set along with SGX_ENCL_DEAD in sgx_mmu_notifier_release(), and
> > only call do_unmap() if SGX_ENCL_MM_RELEASED is false. Note that this
> > means we cant unregister the mmu_notifier until after do_unmap(), but
> > that's true no matter what since we're relying on the mmu_notifier to
> > hold a reference to mm_struct. Patch attached.
>
> OK, the fix change makes sense but I'm thinking that would it be a
> better idea just to set mm NULL and check that instead?

That makes sense.