Re: [PATCH v5 00/15] x86: Add support for Clang CFI

From: Kees Cook
Date: Wed Oct 27 2021 - 13:11:33 EST


On Wed, Oct 27, 2021 at 03:04:55PM +0200, Peter Zijlstra wrote:
> On Wed, Oct 27, 2021 at 02:48:52PM +0200, Peter Zijlstra wrote:
> > On Wed, Oct 27, 2021 at 02:22:27PM +0200, Ard Biesheuvel wrote:
> > > On Wed, 27 Oct 2021 at 14:05, Mark Rutland <mark.rutland@xxxxxxx> wrote:
> >
> > > > > Should not this jump-table thingy get converted to an actual function
> > > > > address somewhere around arch_static_call_transform() ? This also seems
> > > > > relevant for arm64 (which already has CLANG_CFI supported) given:
> > > > >
> > > > > https://lkml.kernel.org/r/20211025122102.46089-3-frederic@xxxxxxxxxx
> > > >
> > > > Ugh, yeah, we'll need to do the function_nocfi() dance somewhere...
> > > >
> > >
> > > Sadly, that only works on symbol names, so we cannot use it to strip
> > > CFI-ness from void *func arguments passed into the static call API,
> > > unfortunately.
> >
> > Right, and while mostly static_call_update() is used, whcih is a macro
> > and could possibly be used to wrap this, we very much rely on
> > __static_call_update() also working without that wrapper and then we're
> > up a creek without no paddles.
>
> Specifically, we support code like:
>
> struct foo {
> void (*func1)(args1);
> void (*func2)(args2);
> }
>
> struct foo global_foo;

And global_foo is intentionally non-const?

>
> ...
>
> DEFINE_STATIC_CALL_NULL(func1, *global_foo.func1);
>
> ...
>
> __init foo_init()
> {
> // whatever code that fills out foo
>
> static_call_update(func1, global_foo.func1);
> }
>
> ...
>
> hot_function()
> {
> ...
> static_cal(func1)(args1);
> ...
> }
>
> cold_function()
> {
> ...
> global_foo->func1(args1);
> ...
> }

If global_foo is non-const, then the static call stuff is just an
obfuscated indirect call. The attack CFI attempts to block is having
a controlled write flaw turn into controlled execution. For example,
in the above, an attacker would use a flaw that could aim a write to
global_foo->func1, and then get the kernel to take an execution path
that executes global_foo->func1 (i.e. through cold_function).

If static_call_update() accepts an arbitrary function argument, then it
needs to perform the same validation that is currently being injected
at indirect call sites to avoid having static calls weaken CFI. If
things are left alone, static calls will just insert a direct call to
the jump table, and things "work" (with an extra jump), but nothing
is actually protected (it just requires the attacker locate a more
narrow set of kernel call flows that includes static_call_update). Any
attacker write to global_foo->func1, followed by a static_call_update(),
will create a direct call (with no CFI checking) to the attacker's
destination. (It's likely harder to find the needed execution path to
induce a static_call_update(), but that shouldn't stop us from trying
to protect it.)

Getting a "jump table to actual function" primitive only avoids the added
jump -- all the CFI checking remains bypassed. If static call function
address storage isn't const, for CFI to work as expected the update()
routine will need to do the same checking that is done at indirect call
sites when extracting the "real" function for writing into a direct call.
(i.e. it will need to be function prototype aware, be able to find the
real matching jump table and size, and verify the given function is
actually in the table. Just dereferencing the jump table to find the
address isn't a protection: the attacker just has to write to func1 with
a pointer to an attacker-constructed jump table.)

I think it's not unreasonable to solve this in phases, though. Given that
static calls work with CFI as-is (albeit with an extra jump), and that
static calls do offer some hardening of existing indirect call targets (by
narrowing the execution path needed to actually bring the func1 value into
the kernel execution flow), I don't see either feature blocking the other.

Adding proper CFI function prototype checking to static calls seems
like a security improvement with or without CFI, and a minor performance
improvement under CFI.

To avoid all of this, though, it'd be better if static calls only
switched between one of a per-site const list of possible functions,
which would make it a much tighter hand-rolled CFI system itself. :)
(i.e. it would operate from a specific and short list of expected
functions rather than the "best effort" approach of matching function
prototypes as done by Clang CFI.)

--
Kees Cook