Re: RFC: userspace exception fixups

From: Andy Lutomirski
Date: Tue Nov 06 2018 - 18:01:02 EST

Next message: Mark Salyzyn: "[PATCH v8 1/2] overlayfs: check CAP_DAC_READ_SEARCH before issuing exportfs_decode_fh"
Previous message: Thomas Gleixner: "Re: [Patch v4] genirq/matrix: Choose CPU for managed IRQs based on how many of them are allocated"
In reply to: Sean Christopherson: "Re: RFC: userspace exception fixups"
Next in thread: Sean Christopherson: "Re: RFC: userspace exception fixups"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

>> On Nov 6, 2018, at 1:59 PM, Sean Christopherson <sean.j.christopherson@xxxxxxxxx> wrote:
>>
>>> On Tue, 2018-11-06 at 13:41 -0800, Andy Lutomirski wrote:
>>>> On Tue, Nov 6, 2018 at 1:07 PM Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
>>>>
>>>>
>>>>> On Nov 6, 2018, at 1:00 PM, Dave Hansen <dave.hansen@xxxxxxxxx> wrote:
>>>>>
>>>>>
>>>>> On 11/6/18 12:12 PM, Andy Lutomirski wrote:
>>>>> True, but what if we have a nasty enclave that writes to memory just
>>>>> below SP *before* decrementing SP?
>>>> Yeah, that would be unfortunate. If an enclave did this (roughly):
>>>>
>>>> 1. EENTER
>>>> 2. Hardware sets eenter_hwframe->sp = %sp
>>>> 3. Enclave runs... wants to do out-call
>>>> 4. Enclave sets up parameters:
>>>> memcpy(&eenter_hwframe->sp[-offset], arg1, size);
>>>> ...
>>>> 5. Enclave sets eenter_hwframe->sp -= offset
>>>>
>>>> If we got a signal between 4 and 5, we'd clobber the copy of 'arg1' that
>>>> was on the stack. The enclave could easily fix this by moving ->sp first.
>>>>
>>>> But, this is one of those "fun" parts of the ABI that I think we need to
>>>> talk about. If we do this, we also basically require that the code
>>>> which handles asynchronous exits must *not* write to the stack. That's
>>>> not hard because it's typically just a single ERESUME instruction, but
>>>> it *is* a requirement.
>>> I was assuming that the async exit stuff was completely hidden by the API. The AEP code would decide whether the exit got fixed up by the kernel (which may or may not be easy to tell â can the
>>> code even tell without kernel help whether it was, say, an IRQ vs #UD?) and then either do ERESUME or cause sgx_enter_enclave() to return with an appropriate return value.
>> Sean, how does the current SDK AEX handler decide whether to do
>> EENTER, ERESUME, or just bail and consider the enclave dead? It seems
>> like the *CPU* could give a big hint, but I don't see where there is
>> any architectural indication of why the AEX code got called or any
>> obvious way for the user code to know whether the exit was fixed up by
>> the kernel?
>
> The SDK "unconditionally" does ERESUME at the AEP location, but that's
> bit misleading because its signal handler may muck with the context's
> RIP, e.g. to abort the enclave on a fatal fault.
>
> On an event/exception from within an enclave, the event is immediately
> delivered after loading synthetic state and changing RIP to the AEP.
> In other words, jamming CPU state is essentially a bunch of vectoring
> ucode preamble, but from software's perspective it's a normal event
> that happens to point at the AEP instead of somewhere in the enclave.
> And because the signals the SDK cares about are all synchronous, the
> SDK can simply hardcode ERESUME at the AEP since all of the fault logic
> resides in its signal handler. IRQs and whatnot simply trampoline back
> into the enclave.
>
> Userspace can do something funky instead of ERESUME, but only *after*
> IRET/RSM/VMRESUME has returned to the AEP location, and in Linux's
> case, after the trap handler has run.
>
> Jumping back a bit, how much do we care about preventing userspace
> from doing stupid things?

My general feeling is that userspace should be allowed to do apparently stupid things. For example, as far as the kernel is concerned, Wine and DOSEMU are just user programs that do stupid things. Linux generally tries to provide a reasonably complete view of architectural behavior. This is in contrast to, say, Windows, where IIUC doing an unapproved WRFSBASE May cause very odd behavior indeed. So magic fixups that do non-architectural things are not so great.

The flip side, of course, is that the architecture is arguably inherently erratic here, and itâs apparently impossible to have an SGX library with sane semantics without some kernel assistance.

So if we can make my straw man API work, perhaps with vDSO or rseq-like help, then the official SDK can use it, but less well behaved programs can still mostly work. (Modulo Linuxâs non-support for EINITTOKEN, of course.)

Thinking about it some more, the major sticking point may be finding the RIP and stack frame of EENTER in the AEP code or in its fixup. The vDSO canât use TLS without serious hackery. We could massively abuse WRFSBASE, but thatâs really ugly.

(How does the Windows case work? If thereâs an exception after the untrusted stack allocation and before EEXIT and SEH tries to handle it, how does the unwinder figure out where to start?)

> I did a quick POC on the idea of hardcoding
> fixup for the ENCLU opcode, and the basic idea checks out. The code
> is fairly minimal and doesn't impact the core functionality of the SDK.
> They'd need to redo their trap handling to move it from the signal
> handler to inline, but their stack shenanigans won't be any more broken
> than they already are.

Next message: Mark Salyzyn: "[PATCH v8 1/2] overlayfs: check CAP_DAC_READ_SEARCH before issuing exportfs_decode_fh"
Previous message: Thomas Gleixner: "Re: [Patch v4] genirq/matrix: Choose CPU for managed IRQs based on how many of them are allocated"
In reply to: Sean Christopherson: "Re: RFC: userspace exception fixups"
Next in thread: Sean Christopherson: "Re: RFC: userspace exception fixups"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]