Cycles annotation support for perf tools

From: Andi Kleen
Date: Sun May 10 2015 - 09:54:51 EST


The upcoming Skylake CPU has a new timed branch stack feature,
that reports cycle counts for individual branches in the
last branch record.

This allows to get fine grained cost information for code, and also allows
to compute fine grained IPC.

This patchkit adds support for this in the perf tools:
- Basic support for the cycles field like other branch fields
- Show cycles in the standard branch sort view (no IPC here,
as IPC needs the instruction counts from annotation)
- Annotate cycles and IPC in the assembler annotate view
- Add branch support to top, so we can do live annotation.
- Misc support, like dumping it in perf report -D

The kernel support has been posted separately. I included a test patch
to generate fake data for testing on existing systems.

Example output for annotate (with made up numbers):

The second column is the IPC and third average cycles for the basic block.

â static int hex(char ch) â
â { â
8.20 â push %rbp â
8.20 â mov %rsp,%rbp â
8.20 â sub $0x20,%rsp â
8.20 â mov %edi,%eax â
8.20 â mov %al,-0x14(%rbp) â
8.20 â mov %fs:0x28,%rax â
8.20 â mov %rax,-0x8(%rbp) â
8.20 â xor %eax,%eax â
â if ((ch >= '0') && (ch <= '9')) â
8.20 â cmpb $0x2f,-0x14(%rbp) â
66.67 8.20 123 â â jle 31 â
8.20 â cmpb $0x39,-0x14(%rbp) â
8.20 123 â â jg 31 â
â return ch - '0'; â
22.22 8.20 â movsbl -0x14(%rbp),%eax â
8.20 â sub $0x30,%eax â
8.20 123 â â jmp 60 â
â if ((ch >= 'a') && (ch <= 'f')) â
17.57 â31: cmpb $0x60,-0x14(%rbp) â
17.57 123 â â jle 46 â
17.57 â cmpb $0x66,-0x14(%rbp) â
17.57 â â jg 46 â
â return ch - 'a' + 10; â
17.57 â movsbl -0x14(%rbp),%eax

Example output for branch view (again with fake data):

Overhead Command Source Shared Object Source Symbol Target Symbol Basic Block Cycles â
30.08% tcall tcall [.] f1 [.] f2 123 â
27.44% tcall tcall [.] f2 [.] f1 123 â
15.60% tcall tcall [.] main [.] f1 123 â
12.96% tcall tcall [.] f1 [.] main 123 â
12.86% tcall tcall [.] main [.] main 123 â
0.08% tcall [kernel.kallsyms] [k] hrtimer_interrupt [k] hrtimer_interrupt 123

IPC computation has a few limitations (see the comments in the respective patches),
in particular it punts on overlaping basic blocks.

The annotation only works for the interactive annotation. Currently it is not
working in the scripted perf annotate, as that is missing a lot of the
infrastructure needed for per instruction state.

It would be nice to add column headers to annotate.

So far no support in --branch-history or in perf script.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/