Re: [PATCH v4 0/6] Basic recovery for machine checks inside SGX

From: Jarkko Sakkinen
Date: Tue Aug 31 2021 - 22:06:55 EST


On Fri, 2021-08-27 at 12:55 -0700, Tony Luck wrote:
> Here's version 4 (just 38 more to go if I want to meet the bar set by
> the base SGX series :-) )
>
> Changes since v3:
>
> Dave Hansen:
> 1) Concerns about assigning a default value to the "owner"
> pointer if the caller of sgx_alloc_epc_page() called with
> a NULL value.
> Resolved: Sean provided a patch to fix the only caller that
> was using NULL. I merged it in here.
>
> 2) Better commit message to explain why sgx_is_epc_page() is
> exported.
> Done.
>
> 3) Unhappy with "void *owner" in struct sgx_epc_page. Would
> be better to use an anonymous union of all the types.
> Done.
>
> Sean Christopherson:
> 1) Races updating bits in flags field.
> Resolved: "poison" is now a separate field.
>
> 2) More races. When poison alert happens while moving
> a page on/off a free/dirty list.
> Resolved: Well mostly. All the run time changes are now
> done while holding the node->lock. There's a gap while
> moving pages from dirty list to free list. But that's
> a short-ish window during boot, and the races are mostly
> harmless. Worst is that we might call __eremove() for a
> page that just got marked as poisoned. But then
> sgx_free_epc_page() will see the poison flag and do the
> right thing.
>
> Jarkko Sakkinen:
> 1) Use xarray to keep track of which pages are the special
> SGX EPC ones.
> This spawned a short discussion on whether it was overkill. But
> xarray makes the source much simpler, and there are improvements
> in the pipeline for xarray that will make it handle this use
> case more efficiently. So I made this change.
>
> 2) Move the sgx debugfs directory under arch_debugfs_dir.
> Done.
>
> Tony Luck (6):
> x86/sgx: Provide indication of life-cycle of EPC pages
> x86/sgx: Add infrastructure to identify SGX EPC pages
> x86/sgx: Initial poison handling for dirty and free pages
> x86/sgx: Add SGX infrastructure to recover from poison
> x86/sgx: Hook sgx_memory_failure() into mainline code
> x86/sgx: Add hook to error injection address validation
>
> .../firmware-guide/acpi/apei/einj.rst | 19 +++
> arch/x86/include/asm/set_memory.h | 4 +
> arch/x86/kernel/cpu/sgx/encl.c | 5 +-
> arch/x86/kernel/cpu/sgx/encl.h | 2 +-
> arch/x86/kernel/cpu/sgx/ioctl.c | 2 +-
> arch/x86/kernel/cpu/sgx/main.c | 140 ++++++++++++++++--
> arch/x86/kernel/cpu/sgx/sgx.h | 14 +-
> drivers/acpi/apei/einj.c | 3 +-
> include/linux/mm.h | 15 ++
> mm/memory-failure.c | 19 ++-
> 10 files changed, 196 insertions(+), 27 deletions(-)
>
>
> base-commit: e22ce8eb631bdc47a4a4ea7ecf4e4ba499db4f93

Would be nice to get this also to linux-sgx@xxxxxxxxxxxxxxx in
future.

/Jarkko