Re: [PATCH] static_call,x86: Robustify trampoline patching

From: Ard Biesheuvel
Date: Sun Oct 31 2021 - 12:25:03 EST


On Sat, 30 Oct 2021 at 20:55, Ard Biesheuvel <ardb@xxxxxxxxxx> wrote:
>
> On Sat, 30 Oct 2021 at 20:03, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >
> > On Sat, Oct 30, 2021 at 07:19:53PM +0200, Ard Biesheuvel wrote:
> > > I just realized that arm64 has the exact same problem, which is not
> > > being addressed by my v5 of the static call support patch.
> >
> > Yeah, it would.
> >
> > > As it turns out, the v11 Clang that I have been testing with is broken
> > > wrt BTI landing pads, and omits them from the jump table entries.
> > > Clang 12+ adds them properly, which means that both the jump table
> > > entry and the static call trampoline may start with BTI C + direct
> > > branch, and we also need additional checks to disambiguate.
> >
> > I'm not sure, why would the static_call trampoline need a BTI C ? The
> > whole point of static_call() is to be a direct call, we should never
> > have an indirect call to the trampoline, that would defeat the whole
> > purpose.
>
> This might happen when the distance between the caller and the
> trampoline is more than 128 MB, in which case we emit a veneer that
> uses an indirect call as well. So we definitely need the landing pad
> in the trampoline.

Something like the below seems to work to prevent getting the wrong
trampoline address into arch_static_call_transform:

diff --git a/arch/x86/include/asm/static_call.h
b/arch/x86/include/asm/static_call.h
index cbb67b6030f9..c3704ea21bee 100644
--- a/arch/x86/include/asm/static_call.h
+++ b/arch/x86/include/asm/static_call.h
@@ -25,7 +25,9 @@
asm(".pushsection .static_call.text, \"ax\" \n" \
".align 4 \n" \
".globl " STATIC_CALL_TRAMP_STR(name) " \n" \
+ ".globl " STATIC_CALL_TRAMP_P_STR(name) " \n" \
STATIC_CALL_TRAMP_STR(name) ": \n" \
+ STATIC_CALL_TRAMP_P_STR(name) ": \n" \
insns " \n" \
".type " STATIC_CALL_TRAMP_STR(name) ", @function \n" \
".size " STATIC_CALL_TRAMP_STR(name) ", . - "
STATIC_CALL_TRAMP_STR(name) " \n" \
diff --git a/include/linux/static_call.h b/include/linux/static_call.h
index 19dc210214c0..46777a3395d3 100644
--- a/include/linux/static_call.h
+++ b/include/linux/static_call.h
@@ -143,7 +143,7 @@
*/
extern void arch_static_call_transform(void *site, void *tramp, void
*func, bool tail);

-#define STATIC_CALL_TRAMP_ADDR(name) &STATIC_CALL_TRAMP(name)
+#define STATIC_CALL_TRAMP_ADDR(name) &STATIC_CALL_TRAMP_P(name)

#else
#define STATIC_CALL_TRAMP_ADDR(name) NULL
diff --git a/include/linux/static_call_types.h
b/include/linux/static_call_types.h
index 5a00b8b2cf9f..98a448f5ae45 100644
--- a/include/linux/static_call_types.h
+++ b/include/linux/static_call_types.h
@@ -18,6 +18,8 @@
#define STATIC_CALL_TRAMP(name)
__PASTE(STATIC_CALL_TRAMP_PREFIX, name)
#define STATIC_CALL_TRAMP_STR(name) __stringify(STATIC_CALL_TRAMP(name))

+#define STATIC_CALL_TRAMP_P(name) __PASTE(STATIC_CALL_TRAMP(name), _p)
+#define STATIC_CALL_TRAMP_P_STR(name) __stringify(STATIC_CALL_TRAMP_P(name))
/*
* Flags in the low bits of static_call_site::key.
*/
@@ -36,7 +38,8 @@ struct static_call_site {

#define DECLARE_STATIC_CALL(name, func)
\
extern struct static_call_key STATIC_CALL_KEY(name); \
- extern typeof(func) STATIC_CALL_TRAMP(name);
+ extern typeof(func) STATIC_CALL_TRAMP(name); \
+ extern u8 STATIC_CALL_TRAMP_P(name);

#ifdef CONFIG_HAVE_STATIC_CALL

That leaves the 'func' argument, which ideally should not go through
the jump table either, but at least it is not terminally broken there.