Re: [PATCH v5 00/15] x86: Add support for Clang CFI
From: Kees Cook
Date: Wed Oct 27 2021 - 18:28:05 EST
On Wed, Oct 27, 2021 at 11:21:26PM +0200, Peter Zijlstra wrote:
> On Wed, Oct 27, 2021 at 10:11:28AM -0700, Kees Cook wrote:
> > On Wed, Oct 27, 2021 at 03:04:55PM +0200, Peter Zijlstra wrote:
> > > [...]
> > > cold_function()
> > > {
> > > ...
> > > global_foo->func1(args1);
> > > ...
> > > }
> >
> > If global_foo is non-const, then the static call stuff is just an
> > obfuscated indirect call.
>
> It is not. The target is determined once, at boot time, depending on the
> hardware, it then never changes. The static_call() results in a direct
> call to that function.
Right, I mean it is _effectively_ an indirect call in the sense that the
call destination is under the control of a writable memory location.
Yes, static_call_update() must be called to make the change, hence why I
didn't wave my arms around when static_call was originally added. It's a
net benefit for the kernel: actual indirect calls now become updatable
direct calls. This means reachability becomes much harder for attackers;
I'm totally a fan. :)
> > If static_call_update() accepts an arbitrary function argument, then it
> > needs to perform the same validation that is currently being injected
> > at indirect call sites to avoid having static calls weaken CFI.
>
> static_call_update() is a macro and has compile time signature checks,
> the actual function is __static_call_update() and someone can go add
> extra validation in there if they so desire.
>
> I did have this patch:
>
> https://lkml.kernel.org/r/20210904105529.GA5106@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>
> but I never did get around to finishing it. Although looking at it now,
> I suppose static_call_seal() might be a better name.
Right -- though wouldn't just adding __ro_after_init do the same?
DEFINE_STATIC_CALL(static_call_name, func_a) __ro_after_init;
If you wanted the clean WARN_ON, __static_call_update() could check if
the struct static_call_key is in a non-writable region.
> And you're worried about __static_call_update() over text_poke_bp()
> because?
Both have a meaningful lack of exposure to common attacker-reachable
code paths (e.g., it's likely rare that there are ways attackers can
control the target/content of a text_poke_bp(): alternatives and ftrace,
both of which tend to use read-only content).
static_call_update() is limited per-site with a fixed set of hardcoded
targets (i.e. any place static_call() is used), but the content
is effectively arbitrary (coming from writable memory). AIUI, it's
intended to be used more widely than text_poke_bp(), and it seems like
call sites using static_call_update() will become less rare in future
kernels. (Currently it seems mostly focused on sched and pmu, which
don't traditionally have much userspace control).
So, they're kind of opposite, but for me the question is for examining
the changes to reachability and what kind of primitives their existence
provides an attacker. IMO, static_calls are a net gain over indirect
calls (from some usually writable function pointer) because it becomes
a direct call. It's risk doesn't match a "real" direct call, though,
since they do still have the (much more narrow) risk of having something
call static_call_update() from a writable function pointer as in your
example, but that's still a defensively better position than an regular
indirect call.
What I'm hoping to see from static_calls is maybe defaulting to being
ro_after_init, and explicitly marking those that can't be, which makes
auditing easier and put things into a default-safe state (i.e. both
fixed targets and fixed content).
> > Getting a "jump table to actual function" primitive only avoids the added
> > jump -- all the CFI checking remains bypassed.
>
> Exactly, so the extra jump serves no purpose and needs to go. Doubly so
> because people are looking at static_call() to undo some of the
> performance damage introduced by CFI :-)
Well, sure, it's inefficient, not _broken_. :) And can you point to the
"use static_calls because CFI is slow" discussion? I'd be curious to read
through that; the past performance testing had shown that CFI performance
overhead tends to be less than the gains of LTO. So compared to a "stock"
kernel, things should be about the same if not faster.
That said, I understand I'm talking to you, and you're paying very close
attention to the scheduler, etc. :) I'm sure there are specific
workloads that look terrible under CFI!
> > If static call function
> > address storage isn't const, for CFI to work as expected the update()
> > routine will need to do the same checking that is done at indirect call
> > sites when extracting the "real" function for writing into a direct call.
>
> I've mentioned static_call like a hundred times in these CFI threads..
> if you want to do CFI on them, go ahead. I'm just not sure the C type
> system is up for that, you'll have to somehow frob the signature symbol
> into __static_call_update(), something like __builtin_type_symbol().
>
> > To avoid all of this, though, it'd be better if static calls only
> > switched between one of a per-site const list of possible functions,
> > which would make it a much tighter hand-rolled CFI system itself. :)
> > (i.e. it would operate from a specific and short list of expected
> > functions rather than the "best effort" approach of matching function
> > prototypes as done by Clang CFI.)
>
> That sounds like a ton of painful ugly.
Right, I know. That's why I'm saying that I see these features as
certainly related, but not at odds with each other. CFI doesn't protect
static_call sites, but static_call sites are already more well protected
than their earlier indirect calls.
The only note I had was that if static_call wants to dig into the jump
table, it likely needs to static_call-specific: we don't want a general
way to do that without knowing we have some reasonable bounds on the
behavior of the code using it. I think it's fine to have static_calls
do this, though it'd be nice if they could be a little more defensive
(generally) at the same time.
--
Kees Cook