Re: [RFC PATCH -tip 3/3] x86/kprobes,orc: Unwind optprobe trampoline correctly

From: Masami Hiramatsu
Date: Wed Mar 31 2021 - 21:46:04 EST


On Wed, 31 Mar 2021 10:57:36 -0500
Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:

> On Wed, Mar 31, 2021 at 02:44:56PM +0900, Masami Hiramatsu wrote:
> > +#ifdef CONFIG_UNWINDER_ORC
> > +unsigned long recover_optprobe_trampoline(unsigned long addr, unsigned long *sp)
> > +{
> > + unsigned long offset, entry, probe_addr;
> > + struct optimized_kprobe *op;
> > + struct orc_entry *orc;
> > +
> > + entry = find_kprobe_optinsn_slot_entry(addr);
> > + if (!entry)
> > + return addr;
> > +
> > + offset = addr - entry;
> > +
> > + /* Decode arg1 and get the optprobe */
> > + op = (void *)extract_set_arg1((void *)(entry + TMPL_MOVE_IDX));
> > + if (!op)
> > + return addr;
> > +
> > + probe_addr = (unsigned long)op->kp.addr;
> > +
> > + if (offset < TMPL_END_IDX) {
> > + orc = orc_find((unsigned long)optprobe_template_func + offset);
> > + if (!orc || orc->sp_reg != ORC_REG_SP)
> > + return addr;
> > + /*
> > + * Since optprobe trampoline doesn't push caller on the stack,
> > + * need to decrement 1 stack entry size
> > + */
> > + *sp += orc->sp_offset - sizeof(long);
> > + return probe_addr;
> > + } else {
> > + return probe_addr + offset - TMPL_END_IDX;
> > + }
> > +}
> > +#endif
>
> Hm, I'd like to avoid intertwining kprobes and ORC like this.
>
> ORC unwinds other generated code by assuming the generated code uses a
> frame pointer. Could we do that here?

No, because the optprobe is not a function call. I considered to make
it call, but since it has to execute copied instructions directly on
the trampoline code (without changing stack frame) it is not possible.

> With CONFIG_FRAME_POINTER, unwinding works because SAVE_REGS_STRING has
> ENCODE_FRAME_POINTER, but that's not going to work for ORC.

Even in that case, the problem is that any interrupt can happen
before doing ENCODE_FRAME_POINTER. I think this ENCODE_FRAME_POINTER
in the SAVE_REGS_STRING is for probing right before the target
function setup a frame pointer.

> Instead of these patches, can we 'push %rbp; mov %rsp, %rbp' at the
> beginning of the template and 'pop %rbp' at the end?

No, since the trampoline code is not called, it is jumped into.
This means there is no "return address" in the stack. If we setup
the frame, there is no return address, thus it might stop there.
(Moreover, optprobe can copy multiple instructins on trampoline
buffer, since relative jump consumes 5bytes. where is the "return address"?)

>
> I guess SAVE_REGS_STRING would need to be smart enough to push the
> original saved version of %rbp. Of course then that breaks the
> kretprobe_trampoline() usage, so it may need to be a separate macro.
>
> [ Or make the same change to kretprobe_trampoline(). Then the other
> patch set wouldn't be needed either ;-) ]

Hmm, I don't think it is a good idea which making such change on the
optimized (hot) path only for the stack tracing. Moreover, that maybe
not transparent with the stack made by int3.

> Of course the downside is, when you get an interrupt during the frame
> pointer setup, unwinding is broken. But I think that's acceptable for
> generated code. We've lived with that limitation for all code, with
> CONFIG_FRAME_POINTER, for many years.

But above code can fix such issue too. To fix a corner case, non-generic
code may be required, even it is not so simple.

>
> Eventually we may want to have a way to register generated code (and the
> ORC for it).
>
> --
> Josh
>


--
Masami Hiramatsu <mhiramat@xxxxxxxxxx>