Re: [PATCH bpf v2 4/4] arm64: rethook: Replace kretprobe trampoline with rethook

From: Masami Hiramatsu
Date: Fri Apr 29 2022 - 11:55:58 EST


Hi Mark,

Thanks for the comment, and sorry about late reply.

On Tue, 12 Apr 2022 20:04:28 +0100
Mark Rutland <mark.rutland@xxxxxxx> wrote:

> > diff --git a/arch/arm64/kernel/rethook.c b/arch/arm64/kernel/rethook.c
> > new file mode 100644
> > index 000000000000..07d8f6ea34a0
> > --- /dev/null
> > +++ b/arch/arm64/kernel/rethook.c
> > @@ -0,0 +1,28 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * Generic return hook for arm64.
> > + * Most of the code is copied from arch/arm64/kernel/probes/kprobes.c
> > + */
> > +
> > +#include <linux/kprobes.h>
> > +#include <linux/rethook.h>
> > +
> > +/* This is called from arch_rethook_trampoline() */
> > +unsigned long __used arch_rethook_trampoline_callback(struct pt_regs *regs);
> > +
> > +unsigned long __used arch_rethook_trampoline_callback(struct pt_regs *regs)
> > +{
> > + return rethook_trampoline_handler(regs, regs->regs[29]);
> > +}
> > +NOKPROBE_SYMBOL(arch_rethook_trampoline_callback);
> > +
> > +int arch_rethook_prepare(struct rethook_node *rhn, struct pt_regs *regs, bool mcount)
>
> What's the `mcount` paramater for, and why don't we seem to use it below?
>
> The presence of that parameter suggests there is a subtle interaction with
> ftrace, but the commit message doesn't mention anything of the sort. I really
> worried that has implications for the unwinder.

mcount parameter is introduced only for arm arch at this point. Other arch will
ignore this parameter. See [3/4] for details. When the rethook caller (ftrace
or kprobe) hooks the function entry, how to get the real return address depends
on whether it is called from ftrace (mcount) or kprobe.
By kprobe, that is called from the first instruction of the function. Thus the
real return address can be found from regs->lr. But from the ftrace (mcount)
context, we can not find it from regs->lr (this LR register is used for saving
the return address of mcount itself) on arm. On arm64, this seems to be fixed.
So I'm not sure this is a real issue on arm or known limitation.

>
> I also see that (with these patches applied) it's possible to select
> CONFIG_FPROBE, but the commit message doesn't describe that either, and I'm not
> sure how to test any of that.

Ah, Sorry. Fprobe requires this feature, so HAVE_RETHOOK=y, CONFIG_FPROBE is
available on this architecture. If you enable CONFIG_FPROBE_SANITY_TEST=y,
this feature is tested by fprobe.

>
> > +{
> > + rhn->ret_addr = regs->regs[30];
> > + rhn->frame = regs->regs[29];
> > +
> > + /* replace return addr (x30) with trampoline */
> > + regs->regs[30] = (u64)arch_rethook_trampoline;
> > + return 0;
> > +}
> > +NOKPROBE_SYMBOL(arch_rethook_prepare);
> > diff --git a/arch/arm64/kernel/rethook_trampoline.S b/arch/arm64/kernel/rethook_trampoline.S
> > new file mode 100644
> > index 000000000000..146d4553674c
> > --- /dev/null
> > +++ b/arch/arm64/kernel/rethook_trampoline.S
> > @@ -0,0 +1,87 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +/*
> > + * trampoline entry and return code for rethook.
> > + * Renamed from arch/arm64/kernel/probes/kprobes_trampoline.S
> > + */
> > +
> > +#include <linux/linkage.h>
> > +#include <asm/asm-offsets.h>
> > +#include <asm/assembler.h>
> > +
> > + .text
> > +
> > + .macro save_all_base_regs
> > + stp x0, x1, [sp, #S_X0]
> > + stp x2, x3, [sp, #S_X2]
> > + stp x4, x5, [sp, #S_X4]
> > + stp x6, x7, [sp, #S_X6]
> > + stp x8, x9, [sp, #S_X8]
> > + stp x10, x11, [sp, #S_X10]
> > + stp x12, x13, [sp, #S_X12]
> > + stp x14, x15, [sp, #S_X14]
> > + stp x16, x17, [sp, #S_X16]
> > + stp x18, x19, [sp, #S_X18]
> > + stp x20, x21, [sp, #S_X20]
> > + stp x22, x23, [sp, #S_X22]
> > + stp x24, x25, [sp, #S_X24]
> > + stp x26, x27, [sp, #S_X26]
> > + stp x28, x29, [sp, #S_X28]
> > + add x0, sp, #PT_REGS_SIZE
> > + stp lr, x0, [sp, #S_LR]
> > + /*
> > + * Construct a useful saved PSTATE
> > + */
> > + mrs x0, nzcv
> > + mrs x1, daif
> > + orr x0, x0, x1
> > + mrs x1, CurrentEL
> > + orr x0, x0, x1
> > + mrs x1, SPSel
> > + orr x0, x0, x1
>
> I realise this is just a copy of the existing kretprobe code, but it would be
> *really* nice if we could avoid faking the regs when we don't take an
> exception, like the FTRACE_WITH_ARGS thing.

No, since this is not a "function call". This is function exit, thus the
all registers can be involved.

>
> The "useful saved PSTATE" is getting increasingly bogus these days, since it
> doesn't contain a bunch of values that are in the real PSTATE, and we don't
> restore anything other than NZCV anyway.

Hmm, that is a real bug. It should be fixed even without this patch, on kretprobes.
Are there any good way to store those flags in the real PSTATE?

I need arm64 maintainer's help to fix that. If there is a limitation of getting
the PSTATE, I need to decide to
- use kprobe instead of hand-assembly code (like powerpc) to hook the trampoline.
or
- just notice users this is a limitation of kretprobe on arm64.


>
> > + stp xzr, x0, [sp, #S_PC]
> > + .endm
> > +
> > + .macro restore_all_base_regs
> > + ldr x0, [sp, #S_PSTATE]
> > + and x0, x0, #(PSR_N_BIT | PSR_Z_BIT | PSR_C_BIT | PSR_V_BIT)
> > + msr nzcv, x0
> > + ldp x0, x1, [sp, #S_X0]
> > + ldp x2, x3, [sp, #S_X2]
> > + ldp x4, x5, [sp, #S_X4]
> > + ldp x6, x7, [sp, #S_X6]
> > + ldp x8, x9, [sp, #S_X8]
> > + ldp x10, x11, [sp, #S_X10]
> > + ldp x12, x13, [sp, #S_X12]
> > + ldp x14, x15, [sp, #S_X14]
> > + ldp x16, x17, [sp, #S_X16]
> > + ldp x18, x19, [sp, #S_X18]
> > + ldp x20, x21, [sp, #S_X20]
> > + ldp x22, x23, [sp, #S_X22]
> > + ldp x24, x25, [sp, #S_X24]
> > + ldp x26, x27, [sp, #S_X26]
> > + ldp x28, x29, [sp, #S_X28]
> > + .endm
> > +
> > +SYM_CODE_START(arch_rethook_trampoline)
> > + sub sp, sp, #PT_REGS_SIZE
> > +
> > + save_all_base_regs
> > +
> > + /* Setup a frame pointer. */
> > + add x29, sp, #S_FP
> > +
> > + mov x0, sp
> > + bl arch_rethook_trampoline_callback
> > + /*
> > + * Replace trampoline address in lr with actual orig_ret_addr return
> > + * address.
> > + */
> > + mov lr, x0
> > +
> > + /* The frame pointer (x29) is restored with other registers. */
> > + restore_all_base_regs
> > +
> > + add sp, sp, #PT_REGS_SIZE
> > + ret
> > +
> > +SYM_CODE_END(arch_rethook_trampoline)
> > diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
> > index e4103e085681..5b717af4b555 100644
> > --- a/arch/arm64/kernel/stacktrace.c
> > +++ b/arch/arm64/kernel/stacktrace.c
> > @@ -8,6 +8,7 @@
> > #include <linux/export.h>
> > #include <linux/ftrace.h>
> > #include <linux/kprobes.h>
> > +#include <linux/rethook.h>
> > #include <linux/sched.h>
> > #include <linux/sched/debug.h>
> > #include <linux/sched/task_stack.h>
> > @@ -38,7 +39,7 @@ static notrace void start_backtrace(struct stackframe *frame, unsigned long fp,
> > {
> > frame->fp = fp;
> > frame->pc = pc;
> > -#ifdef CONFIG_KRETPROBES
> > +#if defined(CONFIG_RETHOOK)
> > frame->kr_cur = NULL;
> > #endif
> >
> > @@ -134,9 +135,9 @@ static int notrace unwind_frame(struct task_struct *tsk,
> > frame->pc = orig_pc;
> > }
> > #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
> > -#ifdef CONFIG_KRETPROBES
> > - if (is_kretprobe_trampoline(frame->pc))
> > - frame->pc = kretprobe_find_ret_addr(tsk, (void *)frame->fp, &frame->kr_cur);
> > +#ifdef CONFIG_RETHOOK
> > + if (is_rethook_trampoline(frame->pc))
> > + frame->pc = rethook_find_ret_addr(tsk, frame->fp, &frame->kr_cur);
> > #endif
>
> Is it possible to have an fprobe and a regular ftrace graph caller call on the
> same ftrace callsite? ... or are those mutually exclusive?

Hmm, those are too different at this moment, but I would like to do that.
What I would like to do is
- if the arch has no ftrace-graph implementation, rethook will use the
kretprobe trampoline based implementation. This can drop some events
because of the resource limitation.
- else, it will use ftrace-graph implementation. This can be very effective
in some case (like tracing frequently called function) because it uses
par-task shadow stack. But not many arch implemented it.
- Also, the interface needs to be unified. Rethook API is intended to be
used for the integration. kretprobe and fprobe will use the rethook, thus
the pt_regs is required.

>
> The existing code here depends on kretprobes and ftrace graph caller
> instrumentation nesting in a specific order, and I'm worried that might have
> changed.

Would you mean that the "nesting" is using kretprobe and ftrace-graph on
same function, correct?
At this moment, this rethook (kretprobe and fprobe) does not use the same
shadow stack of the ftrace graph caller, so the nesting must work correctly.

Thank you,

>
> Thanks,
> Mark.


--
Masami Hiramatsu <mhiramat@xxxxxxxxxx>