[RFC PATCH v3 0/2] Make eBPF programs output data to perf event

From: He Kuang
Date: Tue Jul 07 2015 - 07:44:02 EST


Hi,

The two previous versions tried to combine bpf output data with the
sample event of the attached kprobe point, which leads to problems
about perf_trace_buf.

After discussion we found it's not necessary to combine those two
parts of information, even we do not need the orignial kprobe output
event at all. Based on this idea, the implementation becomes simple,
just like what perf do with ftrace:functions, we set up a bpf ftrace
entry for perf tools to poll and collect data on it, eBpf program use
a helper function to submit data to ring-buffer, that's all. This
implementation also leaves all issues such as sample-types to perf
commandline.

Currently, we just use raw data in the format fields to not interfere
perf sample parser, because the raw-data can be parsed by perf script
plugin easily.

Modify the sample in patch v1 slightly:

SEC("generic_perform_write=generic_perform_write")
int NODE_generic_perform_write(struct pt_regs *ctx)
{
char fmt[] = "generic_perform_write, cur=0x%llx, del=0x%llx\n";
u64 cur_time, del_time;
int ind =0;
struct time_table output, *last = bpf_map_lookup_elem(&global_time_table, &ind);
if (!last)
return 0;

cur_time = bpf_ktime_get_ns();
if (!last->last_time)
del_time = 0;
else
del_time = cur_time - last->last_time;

/* For debug */
bpf_trace_printk(fmt, sizeof(fmt), cur_time, del_time);

/* Update time table */
output.last_time = cur_time;
bpf_map_update_elem(&global_time_table, &ind, &output, BPF_ANY);

/* This is a casual condition to show the funciton */
if (del_time < 1000)
return 0;

bpf_output_sample(&del_time, sizeof(del_time));

return 0;
}

Record bpf events:

$ perf record -e ftrace:bpf -e sample.o -- dd if=/dev/zero of=test bs=4k count=3

The results showed in perf-script:

$ perf script
dd 994 [000] 166.686779: ftrace:bpf: 8: (000000000542b426, ...)
dd 994 [000] 166.686779: ftrace:bpf: 8: (00000000001011ef, ...)
dd 994 [000] 166.686779: ftrace:bpf: 8: (000000000007a2b6, ...)

Thank you.

He Kuang (2):
tracing: Add new trace type for bpf data output
bpf: Introduce function for outputing data to perf event

include/uapi/linux/bpf.h | 3 +++
kernel/trace/bpf_trace.c | 43 +++++++++++++++++++++++++++++++++++++++++++
kernel/trace/trace.h | 6 ++++++
kernel/trace/trace_entries.h | 18 ++++++++++++++++++
samples/bpf/bpf_helpers.h | 2 ++
5 files changed, 72 insertions(+)

--
1.8.5.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/