Re: [PATCH] static_call: use CFI-compliant return0 stubs

From: Ard Biesheuvel

Date: Thu Mar 12 2026 - 03:41:00 EST


Hi Carlos,

You've cc'ed around 50 people on this patch, which is a bit excessive. Better to take get_maintainer.pl with a grain of salt if it proposes a cc list like that.

On Thu, 12 Mar 2026, at 01:16, Carlos Llamas wrote:
> On Thu, Mar 12, 2026 at 12:14:06AM +0100, Peter Zijlstra wrote:
>> On Wed, Mar 11, 2026 at 10:57:40PM +0000, Carlos Llamas wrote:
>> > Architectures with !HAVE_STATIC_CALL (such as arm64) rely on the generic
>> > static_call implementation via indirect calls. In particular, users of
>> > DEFINE_STATIC_CALL_RET0, default to the generic __static_call_return0
>> > stub to optimize the unset path.
>> >
>> > However, __static_call_return0 has a fixed signature of "long (*)(void)"
>> > which may not match the expected prototype at callsites. This triggers
>> > CFI failures when CONFIG_CFI is enabled. A trivial linux-perf command
>> > does it:
>>
>> *sigh*...
>>
>> And ARM64 can't really do the inline thing because its immediate range
>> is too small and it all turns into a mess constructing the address in a
>> register and doing an indirect call anyway, right?
>>
>
> Right, the range for the jump is very limited. I _think_ tracepoints
> have managed to implement the trampoline work-around:
> arch/arm64/kernel/ftrace.c
>
> So it looks do-able I think but a much complex route.
>

So far, we have managed to avoid the blessings of objtool on arm64, and the complexity associated with the inline patching is not really justified, given that on arm64, there is not really a need to avoid indirect calls (and as Peter says, we might end up with them anyway)

A while ago, I had a stab at implementing the out-of-line variety [0], but nobody cared enough to even respond. It is rather concise, and localised to arm64, so it is something we might consider for CONFIG_CFI builds. It is essentially the same sequence that arm64 uses for trampolines between modules and the kernel if they are out of direct branching range, with some .rodata patching to change the target. (arm64 basically only permits code patching without stopping the machine when it involves patching branch opcodes into NOPS or vice versa).

Doing so for only CONFIG_CFI makes sense because it removes the CFI overhead for all static calls, although it adds back some overhead for the trampoline. But there is currently no need to do this unconditionally.




[0] https://lore.kernel.org/linux-arm-kernel/20201120082103.4840-1-ardb@xxxxxxxxxx/