Re: [PATCH V5 1/1] bpf: control events stored in PERF_EVENT_ARRAY maps trace data output when perf sampling
From: Wangnan (F)
Date: Thu Oct 22 2015 - 06:33:33 EST
On 2015/10/22 17:06, Peter Zijlstra wrote:
On Wed, Oct 21, 2015 at 02:19:49PM -0700, Alexei Starovoitov wrote:
Urgh, that's still horridly inconsistent. Can we please come up with a
consistent interface to perf?
My suggestion was to do ioctl(enable/disable) of events from userspace
after receiving notification from kernel via my bpf_perf_event_output()
helper.
Wangnan's argument was that display refresh happens often and it's fast,
so the time taken by user space to enable events on all cpus is too
slow and ioctl does ipi and disturbs other cpus even more.
So soft_disable done by the program to enable/disable particular events
on all cpus kinda makes sense.
And this all makes me think I still have no clue what you're all trying
to do here.
Who cares about display updates and why. And why should there be an
active userspace part to eBPF programs?
So you want the background story? OK, let me describe it. This mail is not
short so please be patient.
On a smartphone, if time between two frames is longer than 16ms, user
can aware it. This is a display glitch. We want to check those glitches
with perf to find what cause them. The basic idea is: use 'perf record'
to collect enough information, find those samples generated just before
the glitch form perf.data then analysis them offline.
There are many works need to be done before perf can support such
analysis. One improtant thing is to reduce the overhead from perf to
avoid perf itself become the reason of glitches. We can do this by reduce
the number of events perf collects, but then we won't have enough
information to analysis when glitch happen. Another way we are trying to
implement
now is to dynamically turn events on and off, or at least enable/disable
sampling dynamically because the overhead of copying those samples
is a big part of perf's total overhead. After that we can trace as many
event as possible, but only fetch data from them when we detect a glitch.
BPF program is the best choice to describe our relative complex glitch
detection model. For simplicity, you can think our glitch detection model
contain 3 points at user programs: point A means display refreshing begin,
point B means display refreshing has last for 16ms, point C means
display refreshing finished. They can be found in user programs and
probed through uprobe, on which we can attach BPF programs on.
Then we want to use perf this way:
Sample 'cycles' event to collect callgraph so we know what all CPUs are
doing during the refreshing, but only start sampling when we detect a
frame refreshing last for 16ms.
We can translate the above logic into BPF programs: at point B, enable
'cycles' event to generate samples; at point C, disable 'cycles' event
to avoid useless samples to be generated.
Then, make 'perf record -e cycles' to collect call graph and other
information through cycles event. From perf.data, we can use 'perf script'
and other tools to analysis resuling data.
We have consider some potential solution and find them inapproate or need
too much work to do:
1. As you may prefer, create BPF functions to call pmu->stop() /
pmu->start() for perf event on the CPU on which BPF programs get
triggered.
The shortcoming of this method is we can only turn on the perf event on
the CPU execute point B. We are unable to know what other CPU are doing
during glitching. But what we want is system-wide information. In
addition,
point C and point B are not necessarily be executed at one core, so
we may
shut down wrong event if scheduler decide to run point C on another
core.
2. As Alexei's suggestion, output something through his
bpf_perf_event_output(),
let perf disable and enable those events using ioctl in userspace.
This is a good idea, but introduces asynchronization problem.
bpf_perf_event_output() output something to perf's ring buffer, but
perf
get noticed about this message when epoll_wait() return. We tested
a tracepoint
event and found that an event may awared by perf sereval seconds after
it generated. One solution is to use --no-buffer, but it still need
perf to be
scheduled on time and parse its ringbuffer quickly. Also, please
note that point
C is possible to appear very shortly after point B because some
APPs are optimized
to make their refreshing time very near to 16ms.
This is the background story. Looks like whether we implement some
corss-CPU controling
or we satisified with coarse grained controlling. The best method we can
think is to
use atomic operation to soft enable/disable perf events. We believe it
is simple enough
and won't cause problem.
Thank you.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/