Re: [PATCH/RFC 0/4] perf annotate: Add --source-only option and the new source code TUI view

From: Taeung Song
Date: Fri Jun 30 2017 - 12:22:09 EST


I'm late..

On 06/28/2017 06:53 PM, Milian Wolff wrote:
On Wednesday, June 28, 2017 5:18:08 AM CEST Taeung Song wrote:
Hi,

The --source-only option and new source code TUI view can show
the result of performance analysis based on full source code per
symbol(function). (Namhyung Kim told me this idea and it was also requested
by others some time ago..)

If someone wants to see the cause, he/she will need to dig into the asm.
But before that, looking at the source level can give a hint or clue
for the problem.

For example, if target symbol is 'hex2u64' of util/util.c,
the output is like below.

$ perf annotate --source-only --stdio -s hex2u64
Percent | Source code of util.c for cycles:ppp (42 samples)
-----------------------------------------------------------------
0.00 : 354 * While we find nice hex chars, build a long_val.
0.00 : 355 * Return number of chars processed.
0.00 : 356 */
0.00 : 357 int hex2u64(const char *ptr, u64 *long_val)
2.38 : 358 {
2.38 : 359 const char *p = ptr;
0.00 : 360 *long_val = 0;
0.00 : 361
30.95 : 362 while (*p) {
23.81 : 363 const int hex_val = hex(*p);
0.00 : 364
14.29 : 365 if (hex_val < 0)
0.00 : 366 break;
0.00 : 367
26.19 : 368 *long_val = (*long_val << 4) | hex_val;
0.00 : 369 p++;
0.00 : 370 }
0.00 : 371
0.00 : 372 return p - ptr;
0.00 : 373 }

And I added many perf developers into Cc: because I want to listen to your
opinions about this new feature, if you don't mind.

If you give some feedback, I'd appreciate it! :)

Thanks Taeung,

I requested this feature some time ago and it's really cool to see someone
step up and implement it - much appreciated!

I just tested it out on my pet-example that leverages C++ instead of C:

~~~~~
#include <complex>
#include <cmath>
#include <random>
#include <iostream>

using namespace std;

int main()
{
uniform_real_distribution<double> uniform(-1E5, 1E5);
default_random_engine engine;
double s = 0;
for (int i = 0; i < 10000000; ++i) {
s += norm(complex<double>(uniform(engine), uniform(engine)));
}
cout << s << '\n';
return 0;
}
~~~~~

Compile it with:

g++ -O2 -g -std=c++11 test.cpp -o test

Then record it with perf:

perf record --call-graph dwarf ./test

Then analyse it with `perf report`. You'll see one entry for main with
something like:

+ 100.00% 39.69% cpp-inlining cpp-inlining [.] main

Select it and annotate it, then switch to your new source-only view:

main test.cpp
â 30
â 31 using namespace std;
â 32
â 33 int main()
â+ 34 {
â 35 uniform_real_distribution<double> uniform(-1E5, 1E5);
â 36 default_random_engine engine;
â+ 37 double s = 0;
â+ 38 for (int i = 0; i < 10000000; ++i) {
4.88 â+ 39 s += norm(complex<double>(uniform(engine),
uniform(engine)));
â 40 }
â 41 cout << s << '\n';
â 42 return 0;
â+ 43 }

Note: the line numbers are off b/c my file contains a file-header on-top.
Ignore that.

Note2: There is no column header shown, so it's unclear what the first column
represents.

Okey, I'll add the first column.


Note 3: report showed 39.69% self cost in main, 100.00% inclusive. annotate
shows 4.88... What is that?

My case is different from your result..but I'll keep digging things
related to the inclusive and self cost to understand the above case, remaking v2 patchset.


What this shows, is that it's extremely important to visualize inclusive cost
_and_ self cost in this view. Additionally, we need to account for inlining.
Right now, we only see the self cost that is directly within main, I suspect.

I handled only one source code file on this patchset,
so I also think it seems to be needed to check other source code file to handle inlining..

For C++ this is usually very misleading, and basically makes the annotate view
completely useless for application-level profiling. If a second column would
be added with the inclusive cost with the ability to drill down, then I could
easily see myself using this view.

Okey, will try to make the feature for 'drill down' after this basic patchset.


BTW, I'll resend v2 patchset later

- considering several source files when getting srcline info
- considering inline function
- investigating the inclusive and self cost

Thanks,
Taeung


I would appreciate if you could take this into account.

Thanks a lot