[PATCH v4 0/6] Basic recovery for machine checks inside SGX

From: Tony Luck
Date: Fri Aug 27 2021 - 15:56:00 EST


Here's version 4 (just 38 more to go if I want to meet the bar set by
the base SGX series :-) )

Changes since v3:

Dave Hansen:
1) Concerns about assigning a default value to the "owner"
pointer if the caller of sgx_alloc_epc_page() called with
a NULL value.
Resolved: Sean provided a patch to fix the only caller that
was using NULL. I merged it in here.

2) Better commit message to explain why sgx_is_epc_page() is
exported.
Done.

3) Unhappy with "void *owner" in struct sgx_epc_page. Would
be better to use an anonymous union of all the types.
Done.

Sean Christopherson:
1) Races updating bits in flags field.
Resolved: "poison" is now a separate field.

2) More races. When poison alert happens while moving
a page on/off a free/dirty list.
Resolved: Well mostly. All the run time changes are now
done while holding the node->lock. There's a gap while
moving pages from dirty list to free list. But that's
a short-ish window during boot, and the races are mostly
harmless. Worst is that we might call __eremove() for a
page that just got marked as poisoned. But then
sgx_free_epc_page() will see the poison flag and do the
right thing.

Jarkko Sakkinen:
1) Use xarray to keep track of which pages are the special
SGX EPC ones.
This spawned a short discussion on whether it was overkill. But
xarray makes the source much simpler, and there are improvements
in the pipeline for xarray that will make it handle this use
case more efficiently. So I made this change.

2) Move the sgx debugfs directory under arch_debugfs_dir.
Done.

Tony Luck (6):
x86/sgx: Provide indication of life-cycle of EPC pages
x86/sgx: Add infrastructure to identify SGX EPC pages
x86/sgx: Initial poison handling for dirty and free pages
x86/sgx: Add SGX infrastructure to recover from poison
x86/sgx: Hook sgx_memory_failure() into mainline code
x86/sgx: Add hook to error injection address validation

.../firmware-guide/acpi/apei/einj.rst | 19 +++
arch/x86/include/asm/set_memory.h | 4 +
arch/x86/kernel/cpu/sgx/encl.c | 5 +-
arch/x86/kernel/cpu/sgx/encl.h | 2 +-
arch/x86/kernel/cpu/sgx/ioctl.c | 2 +-
arch/x86/kernel/cpu/sgx/main.c | 140 ++++++++++++++++--
arch/x86/kernel/cpu/sgx/sgx.h | 14 +-
drivers/acpi/apei/einj.c | 3 +-
include/linux/mm.h | 15 ++
mm/memory-failure.c | 19 ++-
10 files changed, 196 insertions(+), 27 deletions(-)


base-commit: e22ce8eb631bdc47a4a4ea7ecf4e4ba499db4f93
--
2.29.2