Re: [PATCH] static_call,x86: Robustify trampoline patching

From: Peter Zijlstra
Date: Sat Oct 30 2021 - 04:19:17 EST


On Sat, Oct 30, 2021 at 09:47:58AM +0200, Peter Zijlstra wrote:
> On Fri, Oct 29, 2021 at 10:03:24PM +0200, Peter Zijlstra wrote:
>
> > So I had a bit of a peek at what clang generates:
> >
> > 3fa4: 48 c7 c7 00 00 00 00 mov $0x0,%rdi 3fa7: R_X86_64_32S __SCK__x86_pmu_handle_irq
> > 3fab: 48 c7 c6 00 00 00 00 mov $0x0,%rsi 3fae: R_X86_64_32S __SCT__x86_pmu_handle_irq.cfi_jt
> > 3fb2: e8 00 00 00 00 call 3fb7 <init_hw_perf_events+0x1dc> 3fb3: R_X86_64_PLT32 __static_call_update-0x4
> >
> > So this then gives the trampoline jump table entry to
> > __static_call_update(), with the result that it will rewrite the
> > jump-table entry, not the trampoline!
> >
> > Now it so happens that the trampoline looks *exactly* like the
> > jump-table entry (one jmp.d32 instruction), so in that regards it'll
> > again 'work'.
> >
> > But this is all really, as in *really*, wrong. And I'm really sad I'm
> > the one to have to discover this, even though I've mentioned
> > static_call()s being tricky in previous reviews.
>
> The below makes the clang-cfi build properly sick:
>
> [ 0.000000] trampoline signature fail
> [ 0.000000] ------------[ cut here ]------------
> [ 0.000000] kernel BUG at arch/x86/kernel/static_call.c:65!

So fundamentally I think the whole notion that the address of a function
is something different than 'the address of that function' is an *utter*
fail.

So things like FineIBT use a scheme where they pass a hash value along
with the indirect call, which is veryfied at the other end. Can't we
adjust it like:


foo.cfi:
endbr
xorl $0xdeadbeef, %r10d
jz foo
ud2
nop # make it an even 16 bytes
foo:
# actual function text


Then have the address of foo, be the address of foo, like any normal
sane person would expect. Have direct calls to foo, go to foo, again, as
expected.

When doing an indirect call (to r11, as clang does), then, and only
then, do:

movl $0xdeadbeef, %r10d
subq $0x10, %r11
call *%r11

# if the r11 lives, add:
addq $0x10, %r11


Then only when caller and callee agree 0xdeadbeef is the password, does
the indirect call go through.

Why isn't this a suitable CFI scheme even without IBT?