Re: [PATCH/RFC 0/4] perf annotate: Add --source-only option and the new source code TUI view

From: Taeung Song
Date: Wed Jun 28 2017 - 12:27:56 EST




On 06/28/2017 06:53 PM, Milian Wolff wrote:
On Wednesday, June 28, 2017 5:18:08 AM CEST Taeung Song wrote:
Hi,

The --source-only option and new source code TUI view can show
the result of performance analysis based on full source code per
symbol(function). (Namhyung Kim told me this idea and it was also requested
by others some time ago..)

If someone wants to see the cause, he/she will need to dig into the asm.
But before that, looking at the source level can give a hint or clue
for the problem.

For example, if target symbol is 'hex2u64' of util/util.c,
the output is like below.

$ perf annotate --source-only --stdio -s hex2u64
Percent | Source code of util.c for cycles:ppp (42 samples)
-----------------------------------------------------------------
0.00 : 354 * While we find nice hex chars, build a long_val.
0.00 : 355 * Return number of chars processed.
0.00 : 356 */
0.00 : 357 int hex2u64(const char *ptr, u64 *long_val)
2.38 : 358 {
2.38 : 359 const char *p = ptr;
0.00 : 360 *long_val = 0;
0.00 : 361
30.95 : 362 while (*p) {
23.81 : 363 const int hex_val = hex(*p);
0.00 : 364
14.29 : 365 if (hex_val < 0)
0.00 : 366 break;
0.00 : 367
26.19 : 368 *long_val = (*long_val << 4) | hex_val;
0.00 : 369 p++;
0.00 : 370 }
0.00 : 371
0.00 : 372 return p - ptr;
0.00 : 373 }

And I added many perf developers into Cc: because I want to listen to your
opinions about this new feature, if you don't mind.

If you give some feedback, I'd appreciate it! :)

Thanks Taeung,

I requested this feature some time ago and it's really cool to see someone
step up and implement it - much appreciated!

Thank you so much, Milian !! :)


I just tested it out on my pet-example that leverages C++ instead of C:

~~~~~
#include <complex>
#include <cmath>
#include <random>
#include <iostream>

using namespace std;

int main()
{
uniform_real_distribution<double> uniform(-1E5, 1E5);
default_random_engine engine;
double s = 0;
for (int i = 0; i < 10000000; ++i) {
s += norm(complex<double>(uniform(engine), uniform(engine)));
}
cout << s << '\n';
return 0;
}
~~~~~

Compile it with:

g++ -O2 -g -std=c++11 test.cpp -o test

Then record it with perf:

perf record --call-graph dwarf ./test

Then analyse it with `perf report`. You'll see one entry for main with
something like:

+ 100.00% 39.69% cpp-inlining cpp-inlining [.] main

Select it and annotate it, then switch to your new source-only view:

main test.cpp
â 30
â 31 using namespace std;
â 32
â 33 int main()
â+ 34 {
â 35 uniform_real_distribution<double> uniform(-1E5, 1E5);
â 36 default_random_engine engine;
â+ 37 double s = 0;
â+ 38 for (int i = 0; i < 10000000; ++i) {
4.88 â+ 39 s += norm(complex<double>(uniform(engine),
uniform(engine)));
â 40 }
â 41 cout << s << '\n';
â 42 return 0;
â+ 43 }

Note: the line numbers are off b/c my file contains a file-header on-top.
Ignore that.

Note2: There is no column header shown, so it's unclear what the first column
represents.

Note 3: report showed 39.69% self cost in main, 100.00% inclusive. annotate
shows 4.88... What is that?

What this shows, is that it's extremely important to visualize inclusive cost
_and_ self cost in this view. Additionally, we need to account for inlining.
Right now, we only see the self cost that is directly within main, I suspect.
For C++ this is usually very misleading, and basically makes the annotate view
completely useless for application-level profiling. If a second column would
be added with the inclusive cost with the ability to drill down, then I could
easily see myself using this view.

I would appreciate if you could take this into account.

Thanks a lot



Sure, I got it.
I'll investigate this weird case and recheck this patchset based on your comments,
and then I'll reply again. :)

Thanks,
Taeung