Re: [PATCH 11/14] llvm: kCFI pointer stuff

From: Alexei Starovoitov
Date: Wed Oct 02 2024 - 12:49:11 EST


On Tue, Oct 1, 2024 at 3:21 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
>
> Does something like this help?

Yep. Thanks.

> diff --git a/arch/x86/include/asm/cfi.h b/arch/x86/include/asm/cfi.h
> index 31d19c815f99..b6e7e79e79c6 100644
> --- a/arch/x86/include/asm/cfi.h
> +++ b/arch/x86/include/asm/cfi.h
> @@ -44,11 +44,28 @@
> * call *%r11
> *
> *
> + * IBT+:
> + *
> + * foo:
> + * endbr64 / ud1 0(%eax), %edx
> + * ... code here ...
> + * ret
> + *
> + * direct caller:
> + * call foo+4
> + *
> + * indirect caller:
> + * lea foo(%rip), %r11
> + * ...
> + * call *%r11
> + *
> + *
> * kCFI:
> *
> * __cfi_foo:
> * movl $0x12345678, %eax # kCFI signature hash
> - * # 11 nops when CONFIG_CALL_PADDING
> + * movb $0x12, %al # kCFI pointer argument mask
> + * # 9 nops when CONFIG_CALL_PADDING
> * foo:
> * endbr64 # when IBT
> * ... code here ...
> @@ -91,6 +108,57 @@
> * nop4
> * call *%r11
> *
> + *
> + * FineIBT+:
> + *
> + * __cfi_foo:
> + * endbr64
> + * subl 0x12345678, %r10d
> + * jz foo

should it be 'jz foo_4' ?
Otherwise it will trap after endbr64 sealing.

> + * ud2
> + * nop
> + * foo:
> + * ud1 0(%eax), %edx # was endbr64
> + * foo_4:
> + * ... code here ...
> + * ret
> + *
> + * direct caller:
> + * call foo+4
> + *
> + * indirect caller:
> + * lea foo(%rip), %r11
> + * ...
> + * movl $0x12345678, %r10d
> + * subl $16, %r11
> + * nop4
> + * call *%r11
> + *
> + *
> + * FineIBT-BHI:
> + *
> + * __cfi_foo:
> + * endbr64
> + * subl 0x12345678, %r10d
> + * jz foo-1
> + * ud2
> + * foo-1:
> + * call __bhi_args_XXX # depends on kCFI pointer argument mask
> + * foo+4:
> + * ... code here ...
> + * ret
> + *
> + * direct caller:
> + * call foo+4
> + *
> + * indirect caller:
> + * lea foo(%rip), %r11
> + * ...
> + * movl $0x12345678, %r10d
> + * subl $16, %r11
> + * nop4
> + * call *%r11
> + *
> */
> enum cfi_mode {
> CFI_AUTO, /* FineIBT if hardware has IBT, otherwise kCFI */
>
>
>
>
> > I wonder whether this whole complexity is worth it vs
> > always calling __bhi_args_all()
>
> That's one for Scott to answer; I think always doing _all will hurt
> especially bad because it includes rsp.

But why cmovne %rsp ?
Because some pointers are passed on the stack ?
but %rsp itself isn't involved in speculation.
load/store from stack can still speculate regardless of cmovne %rsp ?
Or is it acting like a barrier for all subsequent access from stack
including things like 'push rbp' in the function prologue?

Overall it feels like a lot of complexity for a 'security checkbox'.
If it hurts perf so much regardless the extra % here and there will
be ignored by security obsessed people and folks who care about
performance won't be enabling it anyway.

btw the alternative to hacking compilers is to get information
about pointers in function arguments from BTF.
It's all there. No need to encode it via movb $0x12, %al.

$ bpftool btf dump file vmlinux format c|grep pick_next_task
struct task_struct * (*pick_next_task)(struct rq *, struct task_struct *);