[PATCH v3 7/7] x86/sgx: Add documentation for SGX memory errors

From: Tony Luck
Date: Wed Jul 28 2021 - 16:47:17 EST


Error handling is a bit different for SGX pages. Add a section describing
how asynchronous and consumed errors are handled and the two new
debugfs files that show the count and list of pages with uncorrected
memory errors.

Signed-off-by: Tony Luck <tony.luck@xxxxxxxxx>
---
Documentation/x86/sgx.rst | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)

diff --git a/Documentation/x86/sgx.rst b/Documentation/x86/sgx.rst
index dd0ac96ff9ef..461bd1daa565 100644
--- a/Documentation/x86/sgx.rst
+++ b/Documentation/x86/sgx.rst
@@ -250,3 +250,29 @@ user wants to deploy SGX applications both on the host and in guests
on the same machine, the user should reserve enough EPC (by taking out
total virtual EPC size of all SGX VMs from the physical EPC size) for
host SGX applications so they can run with acceptable performance.
+
+Uncorrected memory errors
+=========================
+Systems that support machine check recovery and have local machine
+check delivery enabled can recover from uncorrected memory errors in
+many situations.
+
+Errors in SGX pages that are not currently in use will prevent those
+pages from being allocated.
+
+Errors asynchronously reported against active SGX pages will simply note
+that the page has an error. If the enclave terminates without accessing
+the page Linux will not return it to the free list for reallocation.
+
+When an uncorrected memory error is consumed from within an enclave the
+h/w will mark that enclave so that it cannot be re-entered. Linux will
+send a SIGBUS to the current task.
+
+In addition to console log entries from processing the machine check or
+corrected machine check interrupt, Linux also provides debugfs files to
+indicate the number of SGX enclave pages that have reported errors and
+the physical addresses of each page:
+
+/sys/kernel/debug/sgx/poison_page_count
+
+/sys/kernel/debug/sgx/poison_page_list
--
2.29.2