Re: [PATCH] static_call,x86: Robustify trampoline patching
From: Peter Zijlstra
Date: Tue Nov 02 2021 - 17:20:13 EST
On Tue, Nov 02, 2021 at 11:10:10AM -0700, Kees Cook wrote:
> On Tue, Nov 02, 2021 at 01:57:44PM +0100, Peter Zijlstra wrote:
> > All questions that need answering I think.
>
> I'm totally fine with designing a new CFI for a future option,
> but blocking the existing (working) one does not best serve our end
> users.
Accepting a half arsed CFI now, just because it is what we have, will
only make it ever so much more difficuly to get new/better
things implemented in the toolchains, because the pressure is off.
Also, it will make enabling IBT more difficult, or we'll end up having
CFI and IBT mutually exclusive, which is a crap position to be in.
> There are already people waiting on x86 CFI because having the
> extra layer of defense is valuable for them. No, it's not perfect,
> but it works right now, and evidence from Android shows that it has
> significant real-world defensive value. Some of the more adventurous
> are actually patching their kernels with the CFI support already, and
> happily running their workloads, etc.
It works by accident, not design. Which is a giant red flag in my book.
> Supporting Clang CFI means we actually have something to evolve
> from,
I don't want to evolve from a place that's crazy and we shouldn't have
been in to begin with. Redefining what a function pointer is is insane,
and the required work-arounds are ugly at best.
The ARM64 folks have expressed regret from having merged it. Why should
x86 do something that's known to cause regret?
> where as starting completely over means (likely significant)
> delays leaving folks without the option available at all.
See above. Maybe some of those folks will get motivated to make it
happen faster.
> I think the various compiler and kernel tweaks needed to improve
> kernel support are reasonable, but building a totally new CFI
> implementation is not: it _does_ work today on x86.
By sodding accident; see that one static call patch that makes it burn.
If someone were to use the jump-table entry after we'd scribbled it,
thing will go sideways bad.
> Yes, it's weird, but not outrageously so.
Clearly we disagree.
> Regardless, speaking to a new CFI design below:
>
> > So how insane is something like this, have each function:
> >
> > foo.cfi:
> > endbr64
> > xorl $0xdeadbeef, %r10d
> > jz foo
> > ud2
> > nop # make it 16 bytes
> > foo:
> > # actual function text goes here
> >
> >
> > And for each hash have two thunks:
> >
> >
> > # arg: r11
> > # clobbers: r10, r11
> > __x86_indirect_cfi_deadbeef:
> > movl -9(%r11), %r10 # immediate in foo.cfi
>
> This requires the text be readable. I have been hoping to avoid this for
> a CFI implementation so we could gain the benefit of execute-only
> memory (available soon on arm64, and possible today on x86 under a
> hypervisor).
It's only needed for the 'legacy' case of software only CFI if that
makes you feel better. BTI/IBT based CFI doesn't need this.
(also, I'm very much a bare metal kinda guy)
> > xorl $0xdeadbeef, %r10 # our immediate
> > jz 1f
> > ud2
> > 1: ALTERNATIVE_2 "jmp *%r11",
> > "jmp __x86_indirect_thunk_r11", X86_FEATURE_RETPOLINE
> > "lfence; jmp *%r11", X86_FEATURE_RETPOLINE_AMD
> >
> >
> >
> > # arg: r11
> > # clobbers: r10, r11
> > __x86_indirect_ibt_deadbeef:
> > movl $0xdeadbeef, %r10
> > subq $0x10, %r11
> > ALTERNATIVE "", "lfence", X86_FEATURE_RETPOLINE
> > jmp *%r11
> >
IBT case ^ doesn't actually read from the code. It simply jumps in front
of the real function by a set amount to get the extra cruft /
landing-pad required for indirect calls.
Also note that direct calls never make use of that pad, which means it
can be stripped from all symbols that never have their address taken,
which means less indirect targets.
(Or poison the thing with UD2 and write 0 in the immediate)
> > And have the actual indirect callsite look like:
> >
> > # r11 - &foo
> > ALTERNATIVE_2 "cs call __x86_indirect_thunk_r11",
> > "cs call __x86_indirect_cfi_deadbeef", X86_FEATURE_CFI
> > "cs call __x86_indirect_ibt_deadbeef", X86_FEATURE_IBT
> >
> > Although if the compiler were to emit:
> >
> > cs call __x86_indirect_cfi_deadbeef
> >
> > we could probaly fix it up from there.
>
> It seems like this could work for any CFI implementation, though, if
The strong suggestion is that function pointers are sane, and there's a
few other considerations, but yes.
> the CFI implementation always performed a call, or if the bounds of the
> inline checking were known? i.e. objtool could also find the inline
> checks just as well as the call, though?
Somewhat hard, it's much easier to find a single instruction than a
pattern. Also, footprint. Smaller really is better.
> > Then we can at runtime decide between:
> >
> > {!cfi, cfi, ibt} x {!retpoline, retpoline, retpoline-amd}
>
> This does look pretty powerful, but I still don't think it precludes
> using the existing Clang CFI. I don't want perfect to be the enemy of
> good. :)
As stated above, we're in violent disagreement that the proposed scheme
comes anywhere near good. I utterly hate the redefintion of function
pointers and I also think the range compare goes sideways in the face of
modules. That's two marks against jump-tables.
(Also, in the !CFI case, you can't actually free the jump-tables, since
you're uncondtionally s(t)uck with them because of how &func is
wrecked.)