Re: [PATCH] static_call,x86: Robustify trampoline patching

From: Ard Biesheuvel
Date: Mon Nov 01 2021 - 10:14:57 EST


On Mon, 1 Nov 2021 at 10:05, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Mon, Nov 01, 2021 at 12:36:18AM +0100, Ard Biesheuvel wrote:
> > On Sun, 31 Oct 2021 at 21:45, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> > >
> > > On Sun, Oct 31, 2021 at 09:21:56PM +0100, Ard Biesheuvel wrote:
> > >
> > > > That means we can support static calls on arm64 now without breaking
> > > > Clang CFI, and work on a solution for the redundant jumps on a more
> > > > relaxed schedule.
> > >
> > > Yes, arm64 has a 'problem' with having already merged the clang-cfi
> > > stuff :/
> > >
> > > I'm hoping the x86 solution can be an alternative CFI scheme, I'm
> > > starting to really hate this one. And I'm not at all convinced the
> > > proposed scheme is the best possible scheme given the constraints of
> > > kernel code. AFAICT it's a compromise made in userspace.
> >
> > Your scheme only works with IBT: the value of %r11 is under the
> > adversary's control so it could just point it at 'foo+0x10' if it
> > wants to call foo indirectly, and circumvent the check. So without IBT
> > (or BTI), I think the check fundamentally belongs in the caller, not
> > in the callee.
>
> How is that not true for the jump table approach? Like I showed earlier,
> it is *trivial* to reconstruct the actual function pointer from a
> jump-table entry pointer.
>

That is not the point. The point is that Clang instruments every
indirect call that it emits, to check whether the type of the jump
table entry it is about to call matches the type of the caller. IOW,
the indirect calls can only branch into jump tables, and all jump
table entries in a table each branch to the start of some function of
the same type.

So the only thing you could achieve by adding or subtracting a
constant value from the indirect call address is either calling
another function of the same type (if you are hitting another entry in
the same table), or failing the CFI type check.

Instrumenting the callee only needs something like BTI, and a
consistent use of the landing pads to ensure that you cannot trivially
omit the check by landing right after it.

> In any case, I really want the discussion to start at square one, and
> show/explain why any chosen CFI scheme is actually good for the kernel.
> Just because clang happened to have implemented it, doesn't make it the
> most suitable scheme for the kernel.

Not disagreeing with that for x86, but again, we're already past that for arm64.