Re: [PATCH v5 00/15] x86: Add support for Clang CFI
From: Ard Biesheuvel
Date: Wed Oct 27 2021 - 09:30:26 EST
On Wed, 27 Oct 2021 at 15:05, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Wed, Oct 27, 2021 at 02:48:52PM +0200, Peter Zijlstra wrote:
> > On Wed, Oct 27, 2021 at 02:22:27PM +0200, Ard Biesheuvel wrote:
> > > On Wed, 27 Oct 2021 at 14:05, Mark Rutland <mark.rutland@xxxxxxx> wrote:
> >
> > > > > Should not this jump-table thingy get converted to an actual function
> > > > > address somewhere around arch_static_call_transform() ? This also seems
> > > > > relevant for arm64 (which already has CLANG_CFI supported) given:
> > > > >
> > > > > https://lkml.kernel.org/r/20211025122102.46089-3-frederic@xxxxxxxxxx
> > > >
> > > > Ugh, yeah, we'll need to do the function_nocfi() dance somewhere...
> > > >
> > >
> > > Sadly, that only works on symbol names, so we cannot use it to strip
> > > CFI-ness from void *func arguments passed into the static call API,
> > > unfortunately.
> >
> > Right, and while mostly static_call_update() is used, whcih is a macro
> > and could possibly be used to wrap this, we very much rely on
> > __static_call_update() also working without that wrapper and then we're
> > up a creek without no paddles.
>
> Specifically, we support code like:
>
> struct foo {
> void (*func1)(args1);
> void (*func2)(args2);
> }
>
> struct foo global_foo;
>
> ...
>
> DEFINE_STATIC_CALL_NULL(func1, *global_foo.func1);
>
> ...
>
> __init foo_init()
> {
> // whatever code that fills out foo
>
> static_call_update(func1, global_foo.func1);
> }
>
> ...
>
> hot_function()
> {
> ...
> static_cal(func1)(args1);
> ...
> }
>
> cold_function()
> {
> ...
> global_foo->func1(args1);
> ...
> }
>
> And there is no way I can see this working sanely with CFI as presented.
>
> Even though the above uses static_call_update(), we can't no longer use
> function_nocfi() on the @func argument, because it's not a symbol, it's
> already a function pointer.
>
> Nor can we fill global_foo.func1 with function_nocfi() because then
> cold_function() goes sideways.
>
As far as I can tell from playing around with Clang, the stubs can
actually be executed directly, they just jumps to the actual function.
The compiler simply generates a jump table for each prototype that
appears in the code as the target of an indirect jump, and checks
whether the target appears in the list.
E.g., the code below
void foo(void) {}
void bar(int) {}
void baz(int) {}
void (* volatile fn1)(void) = foo;
void (* volatile fn2)(int) = bar;
int main(int argc, char *argv[])
{
fn1();
fn2 = baz;
fn2(-1);
}
produces
0000000000400594 <foo.cfi>:
400594: d65f03c0 ret
0000000000400598 <bar.cfi>:
400598: d65f03c0 ret
000000000040059c <baz.cfi>:
40059c: d65f03c0 ret
00000000004005a0 <main>:
4005a0: a9bf7bfd stp x29, x30, [sp, #-16]!
// First indirect call
4005a4: b0000088 adrp x8, 411000 <__libc_start_main@GLIBC_2.17>
4005a8: f9401508 ldr x8, [x8, #40]
4005ac: 90000009 adrp x9, 400000 <__abi_tag-0x278>
4005b0: 91182129 add x9, x9, #0x608
4005b4: 910003fd mov x29, sp
4005b8: eb09011f cmp x8, x9
4005bc: 54000241 b.ne 400604 <main+0x64> // b.any
4005c0: d63f0100 blr x8
// Assignment of fn2
4005c4: 90000009 adrp x9, 400000 <__abi_tag-0x278>
4005c8: b0000088 adrp x8, 411000 <__libc_start_main@GLIBC_2.17>
4005cc: 91184129 add x9, x9, #0x610
4005d0: f9001909 str x9, [x8, #48]
// Second indirect call
4005d4: f9401908 ldr x8, [x8, #48]
4005d8: 90000009 adrp x9, 400000 <__abi_tag-0x278>
4005dc: 91183129 add x9, x9, #0x60c
4005e0: cb090109 sub x9, x8, x9
4005e4: 93c90929 ror x9, x9, #2
4005e8: f100053f cmp x9, #0x1
4005ec: 540000c8 b.hi 400604 <main+0x64> // b.pmore
4005f0: 12800000 mov w0, #0xffffffff // #-1
4005f4: d63f0100 blr x8
4005f8: 2a1f03e0 mov w0, wzr
4005fc: a8c17bfd ldp x29, x30, [sp], #16
400600: d65f03c0 ret
400604: d4200020 brk #0x1
0000000000400608 <__typeid__ZTSFvvE_global_addr>:
400608: 17ffffe3 b 400594 <foo.cfi>
000000000040060c <__typeid__ZTSFviE_global_addr>:
40060c: 17ffffe3 b 400598 <bar.cfi>
400610: 17ffffe3 b 40059c <baz.cfi>
So it looks like taking the address is fine, although not optimal due
to the additional jump. We could fudge around that by checking the
opcode at the target of the call, or token paste ".cfi" after the
symbol name in the static_call_update() macro, but it doesn't like
like anything is terminally broken tbh.