On 2015/7/2 10:48, Alexei Starovoitov wrote:
On 7/1/15 4:58 AM, Peter Zijlstra wrote:
But why create a separate trace buffer, it should go into the regular
perf buffer.
+1
I think
+static char __percpu *perf_extra_trace_buf[PERF_NR_CONTEXTS];
is redundant.
It adds quite a bit of unnecessary complexity to the whole patch set.
Also the call to bpf_output_sample() is not effective unless program
returns 1. It's a confusing user interface.
Also you cannot ever do:
BPF_FUNC_probe_read,
+ BPF_FUNC_output_sample,
BPF_FUNC_ktime_get_ns,
new functions must be added to the end.
Why not just do:
perf_trace_buf_prepare() + perf_trace_buf_submit() from the helper?
No changes to current code.
No need to call __get_data_size() and other overhead.
The helper can be called multiple times from the same program.
imo much cleaner.
Invoke perf_trace_buf_submit() will generate a second perf
event (header->type = PERF_RECORD_SAMPLE) entry which is
different from the event entry outputed by the orignial
kprobe. So the final result of the example in 00/00 patch may
like this:
sample entry 1(from bpf_prog):
comm timestamp1 generic_perform_write pmu_value=0x1234
sample entry 2(from original kprobe):
comm timestamp2 generic_perform_write: (ffffffff81140b60)
Compared with current implementation:
combined sample entry:
comm timestamp generic_perform_write: (ffffffff81140b60)
pmu_value=0x1234
The former two entries may be discontinuous as there are multiple
threads and kprobes to be recorded, and there's a chance that one
entry is missed but the other is recorded. What we need is the
pmu_value read when 'generic_perform_write' enters, the two
entries result is not intuitive enough and userspace tools have
to do the work to find and combine those two sample entries to
get the result.