Re: [PATCH bpf-next] x86/ftrace: relocate %rip-relative percpu refs in dynamic trampolines
From: Peter Zijlstra
Date: Wed May 27 2026 - 17:12:23 EST
On Wed, May 27, 2026 at 09:12:31PM +0200, Alexis Lothoré (eBPF Foundation) wrote:
> With CONFIG_CALL_DEPTH_TRACKING enabled on an x86 retbleed-affected
> platform (eg: Skylake), with retbleed=stuff, registering a dynamic
> ftrace trampoline crashes on the first call into the traced function:
>
>
> This small reproducer allows to easily trigger the crash:
>
> # echo 'p __x64_sys_clock_nanosleep' > /sys/kernel/tracing/kprobe_events
> # echo 1 > /sys/kernel/tracing/events/kprobes/p___x64_sys_clock_nanosleep_0/enable
> # usleep 1
>
> Monitoring the crash under GDB points to the exact instruction in charge
> of incrementing the call depth:
>
> sarq $5, %gs:__x86_call_depth(%rip)
>
> This instruction matches the one inserted by the ftrace_regs_caller from
> ftrace_64.S. This emitted code was likely working fine until the
> introduction of commit 59bec00ace28 ("x86/percpu: Introduce
> %rip-relative addressing to PER_CPU_VAR()"): it has made the call depth
> accounting addressing relative to $rip, instead of being based on an
> absolute address. As this code exact location depends on where the
> trampoline lives in memory, the corresponding displacement needs to be
> adjusted at runtime to actually correctly find the per-cpu
> __x86_call_depth value, otherwise the targeted address is wrong, leading
> to the page fault seen above.
>
> Fix the %rip-relative displacement of the copied CALL_DEPTH_ACCOUNT
> instruction (from ftrace_regs_caller) by calling
> text_poke_apply_relocation(), as it is done for example by the x86 BPF
> JIT compiler through x86_call_depth_emit_accounting(). This corrects
> both CALL_DEPTH_ACCOUNT slots, in ftrace_caller and ftrace_regs_caller.
>
> Fixes: 59bec00ace28 ("x86/percpu: Introduce %rip-relative addressing to PER_CPU_VAR()")
> Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@xxxxxxxxxxx>
> ---
> arch/x86/kernel/ftrace.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
> index 0543b57f54ee..357df1b2922c 100644
> --- a/arch/x86/kernel/ftrace.c
> +++ b/arch/x86/kernel/ftrace.c
> @@ -375,6 +375,13 @@ create_trampoline(struct ftrace_ops *ops, unsigned int *tramp_size)
> goto fail;
> }
>
> + /*
> + * Generated trampoline may contain rip-relative addressing which
> + * displacement needs to be fixed
> + */
> + text_poke_apply_relocation(trampoline, trampoline, size,
> + (void *)start_offset, size);
> +
> /*
> * The address of the ftrace_ops that is used for this trampoline
> * is stored at the end of the trampoline. This will be used to
I went and had a quick grep through the tree to see if there are more
sites that were missed in the conversion (commit 17bce3b2ae2d), but I
couldn't find another one.
Acked-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>