Re: [PATCH bpf-next] bpf,x86: do RSB balance for trampoline

From: Menglong Dong
Date: Mon Nov 10 2025 - 20:48:20 EST


On 2025/11/11 00:32, Alexei Starovoitov wrote:
> On Mon, Nov 10, 2025 at 3:43 AM Menglong Dong <menglong.dong@xxxxxxxxx> wrote:
> >
> >
> > Do you think if it is worth to implement the livepatch with
> > bpf trampoline by introduce the CONFIG_LIVEPATCH_BPF?
> > It's easy to achieve it, I have a POC for it, and the performance
> > of the livepatch increase from 99M/s to 200M/s according to
> > my bench testing.
>
> what do you mean exactly?

This is totally another thing, and we can talk about it later. Let
me have a simple describe here.

I mean to implement the livepatch by bpf trampoline. For now,
the livepatch is implemented with ftrace, which will break the
RSB and has more overhead in x86_64.

It can be easily implemented by replace the "origin_call" with the
address that livepatch offered.

> I don't want to add more complexity to bpf trampoline.

If you mean the arch-specification, it won't add the complexity.
Otherwise, it can make it a little more simple in x86_64 with following
patch:

--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -3176,7 +3176,7 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
void *rw_image_end, void *image,
const struct btf_func_model *m, u32 flags,
struct bpf_tramp_links *tlinks,
- void *func_addr)
+ void *func_addr, void *origin_call_param)
{
int i, ret, nr_regs = m->nr_args, stack_size = 0;
int regs_off, nregs_off, ip_off, run_ctx_off, arg_stack_off, rbx_off;
@@ -3280,6 +3280,7 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
orig_call += ENDBR_INSN_SIZE;
orig_call += X86_PATCH_SIZE;
}
+ orig_call = origin_call_param ?: orig_call;

prog = rw_image;

@@ -3369,15 +3370,10 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
LOAD_TRAMP_TAIL_CALL_CNT_PTR(stack_size);
}

- if (flags & BPF_TRAMP_F_ORIG_STACK) {
- emit_ldx(&prog, BPF_DW, BPF_REG_6, BPF_REG_FP, 8);
- EMIT2(0xff, 0xd3); /* call *rbx */
- } else {
- /* call original function */
- if (emit_rsb_call(&prog, orig_call, image + (prog - (u8 *)rw_image))) {
- ret = -EINVAL;
- goto cleanup;
- }
+ /* call original function */
+ if (emit_rsb_call(&prog, orig_call, image + (prog - (u8 *)rw_image))) {
+ ret = -EINVAL;
+ goto cleanup;
}
/* remember return value in a stack for bpf prog to access */
emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_0, -8);

> Improve current livepatching logic ? jmp vs call isn't special.

Some kind. According to my testing, the performance of bpf
trampoline is much better than ftrace trampoline, so if we
can implement it with bpf trampoline, the performance can be
improved. Of course, the bpf trampoline need to offer a API
to the livepatch for this propose.

Any way, let me finish the work in this patch first. After that,
I can send a RFC of the proposal.

Thanks!
Menglong Dong

>
> > The results above is tested with return-trunk disabled. With the
> > return-trunk enabled, the performance decrease from 58M/s to
> > 52M/s. The main performance improvement comes from the RSB,
> > and the return-trunk will always break the RSB, which makes it has
> > no improvement. The calling to per-cpu-ref get and put make
> > the bpf trampoline based livepatch has a worse performance
> > than ftrace based.
> >
> > Thanks!
> > Menglong Dong
> >
> > >
> >
> >
> >
> >
>
>