Re: [PATCH v3 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()

From: Jarkko Sakkinen
Date: Mon Mar 15 2021 - 09:19:42 EST


On Mon, Mar 15, 2021 at 08:12:36PM +1300, Kai Huang wrote:
> On Sat, 13 Mar 2021 12:45:53 +0200 Jarkko Sakkinen wrote:
> > On Fri, Mar 12, 2021 at 01:21:54PM -0800, Sean Christopherson wrote:
> > > On Thu, Mar 11, 2021, Kai Huang wrote:
> > > > From: Jarkko Sakkinen <jarkko@xxxxxxxxxx>
> > > >
> > > > EREMOVE takes a page and removes any association between that page and
> > > > an enclave. It must be run on a page before it can be added into
> > > > another enclave. Currently, EREMOVE is run as part of pages being freed
> > > > into the SGX page allocator. It is not expected to fail.
> > > >
> > > > KVM does not track how guest pages are used, which means that SGX
> > > > virtualization use of EREMOVE might fail.
> > > >
> > > > Break out the EREMOVE call from the SGX page allocator. This will allow
> > > > the SGX virtualization code to use the allocator directly. (SGX/KVM
> > > > will also introduce a more permissive EREMOVE helper).
> > > >
> > > > Implement original sgx_free_epc_page() as sgx_encl_free_epc_page() to be
> > > > more specific that it is used to free EPC page assigned to one enclave.
> > > > Print an error message when EREMOVE fails to explicitly call out EPC
> > > > page is leaked, and requires machine reboot to get leaked pages back.
> > > >
> > > > Signed-off-by: Jarkko Sakkinen <jarkko@xxxxxxxxxx>
> > > > Co-developed-by: Kai Huang <kai.huang@xxxxxxxxx>
> > > > Acked-by: Jarkko Sakkinen <jarkko@xxxxxxxxxx>
> > > > Signed-off-by: Kai Huang <kai.huang@xxxxxxxxx>
> > > > ---
> > > > v2->v3:
> > > >
> > > > - Fixed bug during copy/paste which results in SECS page and va pages are not
> > > > correctly freed in sgx_encl_release() (sorry for the mistake).
> > > > - Added Jarkko's Acked-by.
> > >
> > > That Acked-by should either be dropped or moved above Co-developed-by to make
> > > checkpatch happy.
> > >
> > > Reviewed-by: Sean Christopherson <seanjc@xxxxxxxxxx>
> >
> > Oops, my bad. Yup, ack should be removed.
> >
> > /Jarkko
>
> Hi Jarkko,
>
> Your reply of your concern of this patch to the cover-letter
>
> https://lore.kernel.org/lkml/YEkJXu262YDa8ZaK@xxxxxxxxxx/
>
> reminds me to do more sanity check of whether removing EREMOVE in
> sgx_free_epc_page() will impact other code path or not, and I think
> sgx_encl_release() is not the only place should be changed:
>
> - sgx_encl_shrink() needs to call sgx_encl_free_epc_page(), since when this is
> called, the VA page can be already valid -- there are other failures can
> trigger sgx_encl_shrink().

You right about this, good catch.

Shrink needs to always do EREMOVE as grow has done EPA, which changes
EPC page state.

> - sgx_encl_add_page() should call sgx_encl_free_epc_page() in "err_out_free:"
> label, since the EPC page can be already valid when error happened, i.e. when
> EEXTEND fails.

Yes, correct, good work!

> Other places should be OK per my check, but I'd prefer to just replacing all
> sgx_free_epc_page() call sites in driver with sgx_encl_free_epc_page(), with
> one exception: sgx_alloc_va_page(), which calls sgx_free_epc_page() when EPA
> fails, in which case EREMOVE is not required for sure.

I would not unless they require it.

> Your idea, please?
>
> Btw, introducing a driver wrapper of sgx_free_epc_page() does make sense to me,
> because virtualization has a counterpart in sgx/virt.c too.

It does make sense to use sgx_free_epc_page() everywhere where it's
the right thing to call and here's why.

If there is some unrelated regression that causes EPC page not get
uninitialized when it actually should, doing extra EREMOVE could mask
those bugs. I.e. it can postpone a failure, which can make a bug harder
to backtrace.

Jarkko