Re: [PATCH] static_call,x86: Robustify trampoline patching

From: Ard Biesheuvel
Date: Sat Oct 30 2021 - 13:20:13 EST


On Sat, 30 Oct 2021 at 09:49, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Fri, Oct 29, 2021 at 10:03:24PM +0200, Peter Zijlstra wrote:
>
> > So I had a bit of a peek at what clang generates:
> >
> > 3fa4: 48 c7 c7 00 00 00 00 mov $0x0,%rdi 3fa7: R_X86_64_32S __SCK__x86_pmu_handle_irq
> > 3fab: 48 c7 c6 00 00 00 00 mov $0x0,%rsi 3fae: R_X86_64_32S __SCT__x86_pmu_handle_irq.cfi_jt
> > 3fb2: e8 00 00 00 00 call 3fb7 <init_hw_perf_events+0x1dc> 3fb3: R_X86_64_PLT32 __static_call_update-0x4
> >
> > So this then gives the trampoline jump table entry to
> > __static_call_update(), with the result that it will rewrite the
> > jump-table entry, not the trampoline!
> >
> > Now it so happens that the trampoline looks *exactly* like the
> > jump-table entry (one jmp.d32 instruction), so in that regards it'll
> > again 'work'.
> >
> > But this is all really, as in *really*, wrong. And I'm really sad I'm
> > the one to have to discover this, even though I've mentioned
> > static_call()s being tricky in previous reviews.
>
> The below makes the clang-cfi build properly sick:
>
> [ 0.000000] trampoline signature fail
> [ 0.000000] ------------[ cut here ]------------
> [ 0.000000] kernel BUG at arch/x86/kernel/static_call.c:65!
>
> ---
> Subject: static_call,x86: Robustify trampoline patching
>
> Add a few signature bytes after the static call trampoline and verify
> those bytes match before patching the trampoline. This avoids patching
> random other JMPs (such as CFI jump-table entries) instead.
>
> These bytes decode as:
>
> d: 53 push %rbx
> e: 43 54 rex.XB push %r12
>
> And happen to spell "SCT".
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>

I just realized that arm64 has the exact same problem, which is not
being addressed by my v5 of the static call support patch.

As it turns out, the v11 Clang that I have been testing with is broken
wrt BTI landing pads, and omits them from the jump table entries.
Clang 12+ adds them properly, which means that both the jump table
entry and the static call trampoline may start with BTI C + direct
branch, and we also need additional checks to disambiguate.