[RFC 00/21] perf tools: Add perf_evsel__is_sample_bit function
From: Jiri Olsa
Date: Wed Jan 24 2018 - 06:51:52 EST
hi,
this RFC contains change to delay sample's user space
data retrieval into task work, originally described and
discussed by Peter and Ingo in here [1].
This patchset tries to follow the original patch with
some kernel changes (described below) and perf tool
support included.
Basically we allow the NMI event code to skip user data
retrieval and schedule task work to do it, before the
task resumes.
Using the task work limits the window where we can do
this. We can trigger the delayed task work only if the
taskwork gets executed before the process executes again
after NMI, because we need its stack as it was in NMI.
That leaves us with window during the slow syscall path
(check task_struct::perf_user_data_allowed in patches).
The slow syscall processing is forced for task when
the user data event is enabled, which makes the task
slower.
On the other hand I noticed roughly 100us drop in NMI
processing times, which I plotted in here [2].
Not sure it's worth to introduce this processing, which adds
more processing time and does not show much improvement. On
the other hand IIRC Peter mentioned it'd be nice to get user
space data retrieval out of NMI.
Also you guys could think of some other better/faster way ;-)
NOTE I also implemented putting the user stack data into
delayed processing, which showed nicer numbers. But it's
little more tricky and brings more changes into this already
big patchset. The logic stays, so I did not include it to
keep the patchset simple.
Also available in:
https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
perf/user_data
thanks for comments,
jirka
[1] https://marc.info/?l=linux-kernel&m=150098372819938&w=2
[2] http://people.redhat.com/~jolsa/ud-bench.png
---
Jiri Olsa (21):
perf tools: Add perf_evsel__is_sample_bit function
perf tools: Add perf_sample__process function
perf tools: Add callchain__printf for pure callchain dump
perf tools: Add perf_sample__copy|free functions
perf: Add TIF_PERF_USER_DATA bit
perf: Add PERF_RECORD_USER_DATA event processing
perf: Add PERF_SAMPLE_USER_DATA_ID sample type
perf: Add PERF_SAMPLE_CALLCHAIN to user data event
perf: Export running sample length values through debugfs
perf tools: Sync perf_event.h uapi header
perf tools: Add perf_sample__parse function
perf tools: Add struct parse_args arg to perf_sample__parse
perf tools: Add support to parse user data event
perf tools: Add support to dump user data event info
perf report: Add delayed user data event processing
perf record: Enable delayed user data events
perf script: Add support to display user data events
perf script: Add support to display user data ID
perf script: Display USER_DATA misc char for sample
perf report: Add user data processing stats
perf report: Add --stats=ud option to display user data debug info
arch/x86/entry/common.c | 6 +++
arch/x86/events/core.c | 18 ++++++++
arch/x86/events/intel/ds.c | 4 +-
arch/x86/include/asm/thread_info.h | 4 +-
include/linux/init_task.h | 4 +-
include/linux/perf_event.h | 3 ++
include/linux/sched.h | 20 ++++++++
include/uapi/linux/perf_event.h | 34 +++++++++++++-
kernel/events/core.c | 283 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
tools/include/uapi/linux/perf_event.h | 34 +++++++++++++-
tools/perf/Documentation/perf-script.txt | 3 +-
tools/perf/builtin-record.c | 2 +
tools/perf/builtin-report.c | 301 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---------
tools/perf/builtin-script.c | 98 +++++++++++++++++++++++++++++++++++++++
tools/perf/perf.h | 1 +
tools/perf/util/event.c | 1 +
tools/perf/util/event.h | 9 ++++
tools/perf/util/evsel.c | 118 +++++++++++++++++++++++++++++++++++++----------
tools/perf/util/evsel.h | 5 ++
tools/perf/util/session.c | 60 +++++++++++++++++++-----
tools/perf/util/thread.c | 1 +
tools/perf/util/thread.h | 16 +++++++
tools/perf/util/tool.h | 1 +
23 files changed, 954 insertions(+), 72 deletions(-)