Re: [PATCH v3 0/2] x86/sgx: Fix two data races in EAUG/EREMOVE flows

From: Dmitrii Kuvaiskii
Date: Fri Jun 07 2024 - 13:55:48 EST


On Tue, May 28, 2024 at 09:01:10AM -0700, Dave Hansen wrote:
> On 5/17/24 04:06, Dmitrii Kuvaiskii wrote:
> > We wrote a trivial stress test to reproduce the hangs observed in
> > real-world applications. The test stresses #PF-based page allocation and
> > SGX_IOC_ENCLAVE_REMOVE_PAGES flows in the SGX driver:
>
> This seems like something we'd want in the kernel SGX selftests.

I looked at tools/testing/selftests/sgx/ and I observe several
complications:

1. The stress test requires creation of several threads (at least two,
ideally more). However, current SGX selftests are single-threaded.
Adding the scaffolding to add multi-threading support to SGX selftests
seems like a non-trivial task.

2. Catching the data race would require a for loop with some threshold.
- First, there are no such looping tests in current SGX selftests. Is
it normal to add such a test?
- Second, what would be the threshold to loop for? I.e., after how many
iterations should we consider the data race not manifesting, and
report success?
- Third, the data race may hang the test. Is this something that is
allowed in selftests? (I mean the test can have only two outcomes --
either it hangs, meaning the data race was not fixed, or it runs to
completion. There is no result that we could EXCEPT or ASSERT on.)

Do we still want to add such a selftest? Or could we maybe piggy-back on
Gramine CI (that will include the test I mentioned in the cover letter)?

--
Dmitrii Kuvaiskii