Re: [RFC PATCH 0/5] Make eBPF programs output data to perf event

From: Alexei Starovoitov
Date: Wed Jul 01 2015 - 23:52:27 EST

Next message: Dan Mick: "failing while writing to disk with BIOS in realmode setup"
Previous message: Alexei Starovoitov: "Re: [RFC PATCH 1/5] bpf: Put perf_events check ahead of bpf prog"
In reply to: He Kuang: "Re: [RFC PATCH 0/5] Make eBPF programs output data to perf event"
Next in thread: Wangnan (F): "Re: [RFC PATCH 0/5] Make eBPF programs output data to perf event"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 7/1/15 8:38 PM, He Kuang wrote:

On 2015/7/2 10:48, Alexei Starovoitov wrote:

On 7/1/15 4:58 AM, Peter Zijlstra wrote:

But why create a separate trace buffer, it should go into the regular
perf buffer.

+1

I think
+static char __percpu *perf_extra_trace_buf[PERF_NR_CONTEXTS];
is redundant.
It adds quite a bit of unnecessary complexity to the whole patch set.

Also the call to bpf_output_sample() is not effective unless program
returns 1. It's a confusing user interface.

Also you cannot ever do:
BPF_FUNC_probe_read,
+ BPF_FUNC_output_sample,
BPF_FUNC_ktime_get_ns,
new functions must be added to the end.

Why not just do:
perf_trace_buf_prepare() + perf_trace_buf_submit() from the helper?
No changes to current code.
No need to call __get_data_size() and other overhead.
The helper can be called multiple times from the same program.
imo much cleaner.

Invoke perf_trace_buf_submit() will generate a second perf
event (header->type = PERF_RECORD_SAMPLE) entry which is
different from the event entry outputed by the orignial
kprobe. So the final result of the example in 00/00 patch may
like this:

sample entry 1(from bpf_prog):
comm timestamp1 generic_perform_write pmu_value=0x1234
sample entry 2(from original kprobe):
comm timestamp2 generic_perform_write: (ffffffff81140b60)
Compared with current implementation:
combined sample entry:
comm timestamp generic_perform_write: (ffffffff81140b60)
pmu_value=0x1234

The former two entries may be discontinuous as there are multiple
threads and kprobes to be recorded, and there's a chance that one
entry is missed but the other is recorded. What we need is the
pmu_value read when 'generic_perform_write' enters, the two
entries result is not intuitive enough and userspace tools have
to do the work to find and combine those two sample entries to
get the result.

Just change your example to return 0 and user space will see
one sample.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Dan Mick: "failing while writing to disk with BIOS in realmode setup"
Previous message: Alexei Starovoitov: "Re: [RFC PATCH 1/5] bpf: Put perf_events check ahead of bpf prog"
In reply to: He Kuang: "Re: [RFC PATCH 0/5] Make eBPF programs output data to perf event"
Next in thread: Wangnan (F): "Re: [RFC PATCH 0/5] Make eBPF programs output data to perf event"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]