Re: x86/kprobes: kretprobe fails to triggered if kprobe at function entry is not optimized (trigger by int3 breakpoint)

From: Masami Hiramatsu
Date: Tue Aug 25 2020 - 10:03:26 EST


On Tue, 25 Aug 2020 15:30:05 +0200
peterz@xxxxxxxxxxxxx wrote:

> On Tue, Aug 25, 2020 at 10:15:55PM +0900, Masami Hiramatsu wrote:
>
> > > damn... one last problem is dangling instances.. so close.
> > > We can apparently unregister a kretprobe while there's still active
> > > kretprobe_instance's out referencing it.
> >
> > Yeah, kretprobe already provided the per-instance data (as far as
> > I know, only systemtap depends on it). We need to provide it for
> > such users.
> > But if we only have one lock, we can avoid checking NMI because
> > we can check the recursion with trylock. It is needed only if the
> > kretprobe uses per-instance data. Or we can just pass a dummy
> > instance on the stack.
>
> I think it is true in general, you can unregister a rp while tasks are
> preempted.

Would you mean the kretprobe handler (or trampoline handler) will be
preempted? All kprobes (including kretprobe) handler is running in
non-preemptive state, so it shouldn't happen...

>
> Anyway,. I think I have a solution, just need to talk to paulmck for a
> bit.

Ah, you mentioned that the removing the kfree() from the trampline
handler? I think we can make an rcu callback which will kfree() the
given instances. (If it works in NMI)

>
> > > Ignoring that issue for the moment, the below seems to actually work.
> >
> > OK, this looks good to me too.
> > I'll make a series to rewrite kretprobe based on this patch, OK?
>
> Please, I'll send the fix along when I have it.

OK, I'm planning to (1) add a generic trampoline code (2) cleanup per-arch
trampoline to use generic one, (3) rewrite the generic trampoline to use
lockless code. Then it will not break anything.

Thank you,

--
Masami Hiramatsu <mhiramat@xxxxxxxxxx>