Re: [PATCH v5 00/15] x86: Add support for Clang CFI

From: Peter Zijlstra
Date: Thu Oct 28 2021 - 16:29:34 EST


On Thu, Oct 28, 2021 at 10:12:32AM -0700, Kees Cook wrote:
> On Thu, Oct 28, 2021 at 01:09:39PM +0200, Peter Zijlstra wrote:
> > On Wed, Oct 27, 2021 at 03:27:59PM -0700, Kees Cook wrote:
> >
> > > Right -- though wouldn't just adding __ro_after_init do the same?
> > >
> > > DEFINE_STATIC_CALL(static_call_name, func_a) __ro_after_init;
> >
> > That breaks modules (and your jump_label patch doing the same is
> > similarly broken).
>
> Well that's no fun. :) I'd like to understand this better so I can fix
> it!
>
> >
> > When a module is loaded that uses the static_call(), it needs to
> > register it's .static_call_sites range with the static_call_key which
> > requires modifying it.
>
> Reading static_call_add_module() leaves me with even more questions. ;)

Yes, that function is highly magical..

> It looks like module static calls need to write to kernel text?

No, they need to modify the static_call_key though.

> I don't
> understand. Is this when a module is using an non-module key for a call
> site? And in that case, this happens:
>
> key |= s_key & STATIC_CALL_SITE_FLAGS;
>
> Where "key" is not in the module?
>
> And the flags can be:
>
> #define STATIC_CALL_SITE_TAIL 1UL /* tail call */
> #define STATIC_CALL_SITE_INIT 2UL /* init section */
>
> But aren't these per-site attributes? Why are they stored per-key?

They are per site, but stored in the key pointer.

So static_call has (and jump_label is nearly identical):

struct static_call_site {
s32 addr;
s32 key;
};

struct static_call_mod {
struct static_call_mod *next;
struct module *mod;
struct static_call_sutes *sites;
};

struct static_call_key {
void *func;
union {
unsigned long type;
struct static_call_mod *mods;
struct static_call_site *sites;
};
};

__SCT_##name() tramplines (no analog with jump_label)

.static_call_sites section
.static_call_tramp_key section (no analog with jump_label)

Where the key holds the current function pointer and a pointer to either
an array of static_call_site or a pointer to a static_call_mod.

Now, a key observation is that all these data structures are word
aligned, which means we have at least 2 lsb bits to play with. For
static_call_key::{mods,sites} the LSB indicates which, 0:mods, 1:sites.

Then the .static_call_sites section is an array of struct
static_call_site sorted by the static_call_key pointer.

The static_call_sites holds relative displacements, but represents:

struct static_call_key *key;
unsigned long call_address;

Now, since code (on x86) is variable length, there are no spare bits in
the code address, but since static_call_key is aligned, we have spare
bits. It is those bits we use to encode TAIL (Bit0) and INIT (Bit1).

If INIT, the address points to an __init section and we shouldn't try
and touch if after those have been freed or bad stuff happens.

If TAIL, it's a tail-call and we get to write a jump instruction instead
of a call instruction.

So, objtool builds .static_call_sites at built time, then at init (or
module load) time we sort the array by static_call_key pointer, such
that we get consequtive ranges per key. We iterate the array and every
time the key pointer changes, we -- already having the key pointer --
set key->sites to the first.

Now, kernel init of static_call happens *really* early and memory
allocation doesn't work yet, which is why we have that {mods,sites}
thing. Therefore, when the first module gets loaded, we need to allocate
a struct static_call_mod for the kernel (mod==NULL) and transfer the
sites pointer to it and change key to a mods pointer.

So one possible solution would be to have a late init (but before RO),
that, re-iterates the sites array and pre-allocates the kernel
static_call_mod structure. That way, static_call_key gets changed to a
mods pointer and wouldn't ever need changing after that, only the
static_call_mod (which isn't RO) gets changed when modules get
added/deleted.

The above is basically identical to jump_labels. However static_call()
have one more trick:

EXPORT_STATIC_CALL_TRAMP()

That exports the trampoline symbol, but not the static_call_key data
structure. The result is that modules can use the static_call(), but
cannot use static_call_update() because they cannot get at the key.

In this case objtool cannot correctly put the static_call_key address in
the static_call_site, what it does instead is store the trampoline
address (there's a 1:1 relation between key and tramplines). And then we
ues the .static_call_tramp_key section to find a mapping from trampoline
to key and rewrite the site to be 'right'. All this happens before
sorting it on key obv.

Hope that clarifies things, instead of making it worse :-)