Re: [PATCH] static_call,x86: Robustify trampoline patching
From: Peter Zijlstra
Date: Tue Nov 02 2021 - 08:58:19 EST
On Mon, Nov 01, 2021 at 03:14:41PM +0100, Ard Biesheuvel wrote:
> On Mon, 1 Nov 2021 at 10:05, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> > How is that not true for the jump table approach? Like I showed earlier,
> > it is *trivial* to reconstruct the actual function pointer from a
> > jump-table entry pointer.
> >
>
> That is not the point. The point is that Clang instruments every
> indirect call that it emits, to check whether the type of the jump
> table entry it is about to call matches the type of the caller. IOW,
> the indirect calls can only branch into jump tables, and all jump
> table entries in a table each branch to the start of some function of
> the same type.
>
> So the only thing you could achieve by adding or subtracting a
> constant value from the indirect call address is either calling
> another function of the same type (if you are hitting another entry in
> the same table), or failing the CFI type check.
Ah, I see, so the call-site needs to have a branch around the indirect
call instruction.
> Instrumenting the callee only needs something like BTI, and a
> consistent use of the landing pads to ensure that you cannot trivially
> omit the check by landing right after it.
That does bring up another point tho; how are we going to do a kernel
that's optimal for both software CFI and hardware aided CFI?
All questions that need answering I think.
So how insane is something like this, have each function:
foo.cfi:
endbr64
xorl $0xdeadbeef, %r10d
jz foo
ud2
nop # make it 16 bytes
foo:
# actual function text goes here
And for each hash have two thunks:
# arg: r11
# clobbers: r10, r11
__x86_indirect_cfi_deadbeef:
movl -9(%r11), %r10 # immediate in foo.cfi
xorl $0xdeadbeef, %r10 # our immediate
jz 1f
ud2
1: ALTERNATIVE_2 "jmp *%r11",
"jmp __x86_indirect_thunk_r11", X86_FEATURE_RETPOLINE
"lfence; jmp *%r11", X86_FEATURE_RETPOLINE_AMD
# arg: r11
# clobbers: r10, r11
__x86_indirect_ibt_deadbeef:
movl $0xdeadbeef, %r10
subq $0x10, %r11
ALTERNATIVE "", "lfence", X86_FEATURE_RETPOLINE
jmp *%r11
And have the actual indirect callsite look like:
# r11 - &foo
ALTERNATIVE_2 "cs call __x86_indirect_thunk_r11",
"cs call __x86_indirect_cfi_deadbeef", X86_FEATURE_CFI
"cs call __x86_indirect_ibt_deadbeef", X86_FEATURE_IBT
Although if the compiler were to emit:
cs call __x86_indirect_cfi_deadbeef
we could probaly fix it up from there.
Then we can at runtime decide between:
{!cfi, cfi, ibt} x {!retpoline, retpoline, retpoline-amd}