[RFC PATCH -tip 0/2] kprobes: A trial to reuse graph-tracer's return stack for kretprobe

From: Masami Hiramatsu
Date: Mon Aug 21 2017 - 11:41:02 EST


Hello,

Here is a feasible study patch to use function_graph
tracer's per-thread return stack for storing kretprobe
return address as fast path.

Currently kretprobe has own instance hash-list for storing
return address. However, it introduces a spin-lock for
hash list entry and compel users to estimate how many
probes run concurrently (and set it to kretprobe->maxactive).

To solve this issue, this reuses function_graph's per-thread
ret_stack for kretprobes as fast path instead of using its
hash-list if possible. Note that if the kretprobe has
custom entry_handler and store data in kretprobe_instance,
we can not use the fast path, since current per-thread
return stack is fixed size. (This feature is used by some
systemtap scripts)

This series also includes showing missed count of
kretprobes via ftrace's kprobe_profile interface, which
had been posted in this March. That is required for
below test case. (without that, we can not see any
kretprobe miss count)

Usage
=====
Note that this is just a feasibility study code, and since
the per-thread ret_stack is initialized only when the
function_graph tracer is enabled, you have to following
operation to enable it.

# echo '*' > <tracefs>/set_graph_notrace
# echo function_graph > <tracefs>/current_tracer

After that, try to add an kretprobe event with just 1
instance (anyway we don't use it).

# echo r1 vfs_write > <tracefs>/kprobe_events
# echo 1 > <tracefs>/events/kprobes/enable

And run "yes" command concurrently.

# for i in {0..31}; do yes > /dev/null & done
# cat <tracefs>/kprobe_profile
r_vfs_write_0 4756473 0

Then you will see the error count (the last column) is zero.
Currently, this feature is disabled when the function graph
tracer is stopped, so if you set nop tracer as below,

# echo nop > <tracefs>/current_tracer

Then you'll see the error count is increasing.

# cat <tracefs>/kprobe_profile
r_vfs_write_0 7663462 238537

This may gain the performance of kretprobe, but I haven't
benchmark it yet.


TODO
====
This is just a feasible study code, I haven't tested it
deeper. It may still have some bugs. Anyway, if it is good,
I would like to split the per-thread return stack code
from ftrace, and make it a new generic feature (e.g.
CONFIG_THERAD_RETURN_STACK) so that both kprobes and
ftrace can share it. It may also move return-stack
allocation as direct call instead of event handler.

Any comment?

Thank you,

---

Masami Hiramatsu (2):
trace: kprobes: Show sum of probe/retprobe nmissed count
kprobes/x86: Use graph_tracer's per-thread return stack for kretprobe


arch/x86/kernel/kprobes/core.c | 95 ++++++++++++++++++++++++++++++++++
include/linux/ftrace.h | 3 +
kernel/kprobes.c | 11 ++++
kernel/trace/trace_functions_graph.c | 5 +-
kernel/trace/trace_kprobe.c | 2 -
5 files changed, 112 insertions(+), 4 deletions(-)

--
Masami Hiramatsu (Linaro) <mhiramat@xxxxxxxxxx>