Re: [PATCH v3 0/6] Static calls
From: Nadav Amit
Date: Thu Jan 10 2019 - 20:47:19 EST
> On Jan 10, 2019, at 4:56 PM, Andy Lutomirski <luto@xxxxxxxxxx> wrote:
>
> On Thu, Jan 10, 2019 at 3:02 PM Linus Torvalds
> <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>> On Thu, Jan 10, 2019 at 12:52 PM Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:
>>> Right, emulating a call instruction from the #BP handler is ugly,
>>> because you have to somehow grow the stack to make room for the return
>>> address. Personally I liked the idea of shifting the iret frame by 16
>>> bytes in the #DB entry code, but others hated it.
>>
>> Yeah, I hated it.
>>
>> But I'm starting to think it's the simplest solution.
>>
>> So still not loving it, but all the other models have had huge issues too.
>
> Putting my maintainer hat on:
>
> I'm okay-ish with shifting the stack by 16 bytes. If this is done, I
> want an assertion in do_int3() or wherever the fixup happens that the
> write isn't overlapping pt_regs (which is easy to implement because
> that code has the relevant pt_regs pointer). And I want some code
> that explicitly triggers the fixup when a CONFIG_DEBUG_ENTRY=y or
> similar kernel is built so that this whole mess actually gets
> exercised. Because the fixup only happens when a
> really-quite-improbable race gets hit, and the issues depend on stack
> alignment, which is presumably why Josh was able to submit a buggy
> series without noticing.
>
> BUT: this is going to be utterly gross whenever anyone tries to
> implement shadow stacks for the kernel, and we might need to switch to
> a longjmp-like approach if that happens.
Here is an alternative idea (although similar to Stevenâs and my code).
Assume that we always clobber R10, R11 on static-calls explicitly, as anyhow
should be done by the calling convention (and gcc plugin should allow us to
enforce). Also assume that we hold a table with all source RIP and the
matching target.
Now, in the int3 handler can you take the faulting RIP and search for it in
the âstatic-callsâ table, writing the RIP+5 (offset) into R10 (return
address) and the target into R11. You make the int3 handler to divert the
code execution by changing pt_regs->rip to point to a new function that does:
push R10
jmp __x86_indirect_thunk_r11
And then you are done. No?