Re: [RFC PATCH 01/11] x86: kernel FineIBT

From: Peter Zijlstra
Date: Mon May 09 2022 - 07:22:44 EST


On Sun, May 08, 2022 at 01:29:13AM -0700, Kees Cook wrote:
> On Wed, May 04, 2022 at 08:16:57PM +0200, Peter Zijlstra wrote:
> > FineIBT kCFI
> >
> > __fineibt_\hash:
> > xor \hash, %r10 # 7
> > jz 1f # 2
> > ud2 # 2
> > 1: ret # 1
> > int3 # 1
> >
> >
> > __cfi_\sym: __cfi_\sym:
> > int3; int3 # 2
> > endbr # 4 mov \hash, %eax # 5
> > call __fineibt_\hash # 5 int3; int3 # 2
> > \sym: \sym:
> > ... ...
> >
> >
> > caller: caller:
> > movl \hash, %r10d # 6 cmpl \hash, -6(%r11) # 8
> > sub $9, %r11 # 4 je 1f # 2
> > call *%r11 # 3 ud2 # 2
> > .nop 4 # 4 (or fixup r11) call __x86_indirect_thunk_r11 # 5
>
> This looks good!
>
> And just to double-check my understanding here... \sym is expected to
> start with endbr with IBT + kCFI?

Ah, the thinking was that 'if IBT then FineIBT', so the combination of
kCFI and IBT is of no concern. And since FineIBT will have the ENDBR in
the __cfi_\sym thing, \sym will not need it.

But thinking about this now I suppose __nocfi call symbols will stlil
need the ENDBR on. Objtool IBT validation would need to either find
ENDBR or a matching __cfi_\sym I suppose.

So I was talking to Joao on IRC the other day, and I realized that if
kCFI generates code as per the above, then we can do FineIBT purely
in-kernel. That is; have objtool generate a section of __cfi_\sym
locations. Then use the .retpoline_sites and .cfi_sites to rewrite kCFI
into the FineIBT form in multi pass:

- read all the __cfi_\sym sites and collect all unique hash values

- allocate (module) memory and write __fineibt_\hash functions for each
unique hash value found

- rewrite callers; nop out kCFI

- rewrite all __cfi_\sym

- rewrite all callers

- enable IBT

And the same on module load I suppose.

But I've only thought about this, not actually implemented it, so who
knows what surprises are lurking there :-)

> Random extra thoughts... feel free to ignore. :) Given that both CFI
> schemes depend on an attacker not being able to construct an executable
> memory region that either starts with endbr (for FineIBT) or starts with
> hash & 2 bytes (for kCFI), we should likely take another look at where
> the kernel uses PAGE_KERNEL_EXEC.
>
> It seems non-specialized use is entirely done via module_alloc(). Obviously
> modules need to stay as-is. So we're left with other module_alloc()
> callers: BPF JIT, ftrace, and kprobes.
>
> Perhaps enabling CFI should tie bpf_jit_harden (which performs constant
> blinding) to the value of bpf_jit_enable? (i.e. either use BPF VM which
> reads from non-exec memory, or use BPF JIT with constant blinding.)
>
> I *think* all the kprobes and ftrace stuff ends up using constructed
> direct calls, though, yes? So if we did bounds checking, we could
> "exclude" them as well as the BPF JIT. Though I'm not sure how
> controllable the content written to the kprobes and ftrace regions are,
> though?

Both ftrace and kprobe only write fairly simple tramplines based off of
a template. Neither has indirect calls.

> For exclusion, we could separate actual modules from the other
> module_alloc() users by maybe allocating in opposite directions from the
> randomized offset and check indirect calls against the kernel text bounds
> and the new modules-only bounds. Sounds expensive, though. Maybe PKS,
> but I can't imagine 2 MSR writes per indirect call would be fast. Hmm...

I'm not sure what problem you're trying to solve..