Re: [PATCH v3 03/25] x86/sgx: Wipe out EREMOVE from sgx_free_epc_page()

From: Sean Christopherson
Date: Tue Mar 23 2021 - 13:03:23 EST


On Tue, Mar 23, 2021, Paolo Bonzini wrote:
> On 23/03/21 17:06, Borislav Petkov wrote:
> > > Practically speaking, "basic" deployments of SGX VMs will be insulated from
> > > this bug. KVM doesn't support EPC oversubscription, so even if all EPC is
> > > exhausted, new VMs will fail to launch, but existing VMs will continue to chug
> > > along with no ill effects....
> >
> > Ok, so it sounds to me like*at* *least* there should be some writeup in
> > Documentation/ explaining to the user what to do when she sees such an
> > EREMOVE failure, perhaps the gist of this thread and then possibly the
> > error message should point to that doc.
>
> That's important, but it's even more important *to developers* that the
> commit message spells out why this would be a kernel bug more often than
> not. I for one do not understand it, and I suspect I'm not alone.
>
> Maybe (optimistically) once we see that explanation we decide that the
> documentation is not important. Sean, Kai, can you explain it?

Thought of a good analogy that can be used for the changelog and/or docs:

This is effectively a kernel use-after-free of EPC, and due to the way SGX works,
the bug is detected at freeing. Rather than add the page back to the pool of
available EPC, the kernel intentionally leaks the page to avoid additional
errors in the future.

Does that help?