Re: RFC: userspace exception fixups
From: Andy Lutomirski
Date: Tue Nov 06 2018 - 11:57:41 EST
> On Nov 6, 2018, at 7:37 AM, Sean Christopherson <sean.j.christopherson@xxxxxxxxx> wrote:
>
>> On Fri, 2018-11-02 at 16:32 -0700, Andy Lutomirski wrote:
>>> On Fri, Nov 2, 2018 at 4:28 PM Jann Horn <jannh@xxxxxxxxxx> wrote:
>>>
>>>
>>> On Fri, Nov 2, 2018 at 11:04 PM Sean Christopherson
>>> <sean.j.christopherson@xxxxxxxxx> wrote:
>>>>
>>>>> On Fri, Nov 02, 2018 at 08:02:23PM +0100, Jann Horn wrote:
>>>>>
>>>>> On Fri, Nov 2, 2018 at 7:27 PM Sean Christopherson
>>>>> <sean.j.christopherson@xxxxxxxxx> wrote:
>>>>>>
>>>>>>> On Fri, Nov 02, 2018 at 10:48:38AM -0700, Andy Lutomirski wrote:
>>>>>>>
>>>>>>> This whole mechanism seems very complicated, and it's not clear
>>>>>>> exactly what behavior user code wants.
>>>>>> No argument there. That's why I like the approach of dumping the
>>>>>> exception to userspace without trying to do anything intelligent in
>>>>>> the kernel. Userspace can then do whatever it wants AND we don't
>>>>>> have to worry about mucking with stacks.
>>>>>>
>>>>>> One of the hiccups with the VDSO approach is that the enclave may
>>>>>> want to use the untrusted stack, i.e. the stack that has the VDSO's
>>>>>> stack frame. For example, Intel's SDK uses the untrusted stack to
>>>>>> pass parameters for EEXIT, which means an AEX might occur with what
>>>>>> is effectively a bad stack from the VDSO's perspective.
>>>>> What exactly does "uses the untrusted stack to pass parameters for
>>>>> EEXIT" mean? I guess you're saying that the enclave is writing to
>>>>> RSP+[0...some_positive_offset], and the written data needs to be
>>>>> visible to the code outside the enclave afterwards?
>>>> As is, they actually do it the other way around, i.e. negative offsets
>>>> relative to the untrusted %RSP. Going into the enclave there is no
>>>> reserved space on the stack. The SDK uses EEXIT like a function call,
>>>> i.e. pushing parameters on the stack and making an call outside of the
>>>> enclave, hence the name out-call. This allows the SDK to handle any
>>>> reasonable out-call without a priori knowledge of the application's
>>>> maximum out-call "size".
>>> But presumably this is bounded to be at most 128 bytes (the red zone
>>> size), right? Otherwise this would be incompatible with
>>> non-sigaltstack signal delivery.
>>
>> I think Sean is saying that the enclave also updates RSP.
>
> Yeah, the enclave saves/restores RSP from/to the current save state area.
>
>> One might reasonably wonder how the SDX knows the offset from RSP to
>> the function ID. Presumably using RBP?
>
> Here's pseudocode for how the SDK uses the untrusted stack, minus a
> bunch of error checking and gory details.
>
> The function ID and a pointer to a marshalling struct are passed to
> the untrusted runtime via normal register params, e.g. RDI and RSI.
> The marshalling struct is what's actually allocated on the untrusted
> stack, like alloca() but more complex and explicit. The marshalling
> struct size is not artificially restricted by the SDK, e.g. AFAIK it
> could span multiple 4k pages.
>
>
> int sgx_out_call(const unsigned int func_index, void *marshalling_struct)
> {
> struct sgx_encl_tls *tls = get_encl_tls();
>
> %RBP = tls->save_state_area[SSA_RBP];
> %RSP = tls->save_state_area[SSA_RSP];
> %RDI = func_index;
> %RSI = marshalling_struct;
>
> EEXIT
>
> /* magic elsewhere to get back here on an EENTER(OUT_CALL_RETURN) */
> return %RAX
> }
>
> void *sgx_alloc_untrusted_stack(size_t size)
> {
> struct sgx_encl_tls *tls = get_encl_tls();
> struct sgx_out_call_context *context;
> void *tmp;
>
> /* create a frame on the trusted stack to hold the out-call context */
> tls->trusted_stack -= sizeof(struct sgx_out_call_context);
>
> /* save the untrusted %RSP into the out-call context */
> context = (struct sgx_out_call_context *)tls->trusted_stack;
> context->untrusted_stack = tls->save_state_area[SSA_RSP];
>
> /* allocate space on the untrusted stack */
> tmp = (void *)(tls->save_state_area[SSA_RSP] - size);
> tls->save_state_area[SSA_RSP] = tmp;
>
> return tmp;
> }
>
> void sgx_pop_untrusted_stack(void)
> {
> struct sgx_encl_tls *tls = get_encl_tls();
> struct sgx_out_call_context *context;
>
> /* retrieve the current out-call context from the trusted stack */
> context = (struct sgx_out_call_context *)tls->trusted_stack;
>
> /* restore untrusted %RSP */
> tls->save_state_area[SSA_RSP] = context->untrusted_stack;
>
> /* pop the out-call context frame */
> tls->trusted_stack += sizeof(struct sgx_out_call_context);
> }
>
> int sgx_main(void)
> {
> struct my_out_call_struct *params;
>
> params = sgx_alloc_untrusted_stack(sizeof(*params));
>
> params->0..N = XYZ;
>
> ret = sgx_out_call(DO_WORK, params);
>
> sgx_pop_untrusted_stack();
>
> return ret;
> }
So I guess the non-enclave code basically canât trust its stack pointer because of these shenanigans. And the AEP code has to live with the fact that its RSP is basically arbitrary and probably canât even be unwound by a debugger? And the EENTER code has to deal with the fact that its red zone can be blatantly violated by the enclave?
Iâm assuming itâs way too late for the SGX SDK to be changed to use a normal RPC mechanism? Iâm a bit disappointed that enclaves can even manipulate outside state like this. I assume Intel had some reason for making it possible, but still.