Re: [RFC] security: replace indirect calls with static calls
From: Kees Cook
Date: Thu Aug 20 2020 - 17:46:10 EST
On Thu, Aug 20, 2020 at 06:47:53PM +0200, Brendan Jackman wrote:
> From: Paul Renauld <renauld@xxxxxxxxxx>
>
> LSMs have high overhead due to indirect function calls through
> retpolines. This RPC proposes to replace these with static calls [1]
typo: RFC
> instead.
Yay! :)
> [...]
> This overhead prevents the adoption of bpf LSM on performance critical
> systems, and also, in general, slows down all LSMs.
I'd be curious to see other workloads too. (Your measurements are a bit
synthetic, mostly showing "worst case": one short syscall in a tight
loop. I'm curious how much performance gain can be had -- we should
still do it, it'll be a direct performance improvement, but I'm curious
about "real world" impact too.)
> [...]
> Previously, the code for this hook would have looked like this:
>
> ret = DEFAULT_RET;
>
> for each cb in [A, B, C]:
> ret = cb(args); <--- costly indirect call here
> if ret != 0:
> break;
>
> return ret;
>
> Static calls are defined at build time and are initially empty (NOP
> instructions). When the LSMs are initialized, the slots are filled as
> follows:
>
> slot idx content
> |-----------|
> 0 | |
> |-----------|
> 1 | |
> |-----------|
> 2 | call A | <-- base_slot_idx = 2
> |-----------|
> 3 | call B |
> |-----------|
> 4 | call C |
> |-----------|
>
> The generated code will unroll the foreach loop to have a static call for
> each possible LSM:
>
> ret = DEFAULT_RET;
> switch(base_slot_idx):
>
> case 0:
> NOP
> if ret != 0:
> break;
> // fallthrough
> case 1:
> NOP
> if ret != 0:
> break;
> // fallthrough
> case 2:
> ret = A(args); <--- direct call, no retpoline
> if ret != 0:
> break;
> // fallthrough
> case 3:
> ret = B(args); <--- direct call, no retpoline
> if ret != 0:
> break;
> // fallthrough
>
> [...]
>
> default:
> break;
>
> return ret;
>
> A similar logic is applied for void hooks.
>
> Why this trick with a switch statement? The table of static call is defined
> at compile time. The number of hook callbacks that will be defined is
> unknown at that time, and the table cannot be resized at runtime. Static
> calls do not define a conditional execution for a non-void function, so the
> executed slots must be non-empty. With this use of the table and the
> switch, it is possible to jump directly to the first used slot and execute
> all of the slots after. This essentially makes the entry point of the table
> dynamic. Instead, it would also be possible to start from 0 and break after
> the final populated slot, but that would require an additional conditional
> after each slot.
Instead of just "NOP", having the static branches perform a jump would
solve this pretty cleanly, yes? Something like:
ret = DEFAULT_RET;
ret = A(args); <--- direct call, no retpoline
if ret != 0:
goto out;
ret = B(args); <--- direct call, no retpoline
if ret != 0:
goto out;
goto out;
if ret != 0:
goto out;
out:
return ret;
> [...]
> The number of available slots for each LSM hook is currently fixed at
> 11 (the number of LSMs in the kernel). Ideally, it should automatically
> adapt to the number of LSMs compiled into the kernel.
Seems like a reasonable thing to do and could be a separate patch.
> If there’s no practical way to implement such automatic adaptation, an
> option instead would be to remove the panic call by falling-back to the old
> linked-list mechanism, which is still present anyway (see below).
>
> A few special cases of LSM don't use the macro call_[int/void]_hook but
> have their own calling logic. The linked-lists are kept as a possible slow
> path fallback for them.
I assume you mean the integrity subsystem? That just needs to be fixed
correctly. If we switch to this, let's ditch the linked list entirely.
Fixing integrity's stacking can be a separate patch too.
> [...]
> Signed-off-by: Paul Renauld <renauld@xxxxxxxxxx>
> Signed-off-by: KP Singh <kpsingh@xxxxxxxxxx>
> Signed-off-by: Brendan Jackman <jackmanb@xxxxxxxxxx>
This implies a maintainership chain, with Paul as the sole author. If
you mean all of you worked on the patch, include Co-developed-by: as
needed[1].
-Kees
[1] https://www.kernel.org/doc/html/latest/process/submitting-patches.html#when-to-use-acked-by-cc-and-co-developed-by
--
Kees Cook