Re: [PATCH v6 2/2] arm64: implement support for static call trampolines
From: Carlos Llamas
Date: Thu Mar 12 2026 - 18:10:47 EST
On Thu, Mar 12, 2026 at 07:35:34PM +0100, Ard Biesheuvel wrote:
> Hi Carlos,
>
> On Thu, 12 Mar 2026, at 19:02, Carlos Llamas wrote:
> > On Fri, Nov 05, 2021 at 03:59:17PM +0100, Ard Biesheuvel wrote:
> >> Implement arm64 support for the 'unoptimized' static call variety, which
> >> routes all calls through a single trampoline that is patched to perform a
> >> tail call to the selected function.
> >>
> >> It is expected that the direct branch instruction will be able to cover
> >> the common case. However, given that static call targets may be located
> >> in modules loaded out of direct branching range, we need a fallback path
> >> that loads the address into R16 and uses a branch-to-register (BR)
> >> instruction to perform an indirect call.
> >>
> >> Unlike on x86, there is no pressing need on arm64 to avoid indirect
> >> calls at all cost, but hiding it from the compiler as is done here does
> >> have some benefits:
> >> - the literal is located in .text, which gives us the same robustness
> >> advantage that code patching does;
> >> - no performance hit on CFI enabled Clang builds that decorate compiler
> >> emitted indirect calls with branch target validity checks.
> >>
> >> Acked-by: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> >> Signed-off-by: Ard Biesheuvel <ardb@xxxxxxxxxx>
> >> ---
> >
> > I'm starting to testing this out on top of 7.0-rc3...
> >
>
> Please use the v3 I referred to on the thread. The code patching is a bit hairy, and so we should only consider that if there is a real use case for it.
>
> Same goes for the special handling of the ret0 case - AFAIR, the v3 handles that transparently as it will just use the generic RET0 handler as the target.
IIUC, the trampoline tests R16 and returns early if unset. However, for
the RET0 case the register would point to __static_call_return0 and then
do the indirect call anyway. I suppose CFI is fine with this as like you
mentioned, this is "hidden" from the compiler.
However, the R16 test seems unnecessary for the RET0 case, as it is
always set to _something_ right? Or maybe I'm missing something?
Either way, now that I've rebased the patch I can send a... v7? With the
required minor tweaks for RET0 and such.
--
Carlos Llamas