Re: [PATCH v7] perf annotate: Fix sample events lost in stdio mode

From: Yang Jihong
Date: Fri Mar 26 2021 - 22:17:10 EST


Hello,

On 2021/3/26 20:06, Arnaldo Carvalho de Melo wrote:
Em Fri, Mar 26, 2021 at 12:25:37PM +0900, Namhyung Kim escreveu:
On Fri, Mar 26, 2021 at 11:24 AM Yang Jihong <yangjihong1@xxxxxxxxxx> wrote:
On 2021/3/19 20:35, Yang Jihong wrote:
In hist__find_annotations function, since different hist_entry may point to same
symbol, we free notes->src to signal already processed this symbol in stdio mode;
when annotate, entry will skipped if notes->src is NULL to avoid repeated output.

However, there is a problem, for example, run the following command:

# perf record -e branch-misses -e branch-instructions -a sleep 1

perf.data file contains different types of sample event.

If the same IP sample event exists in branch-misses and branch-instructions,
this event uses the same symbol. When annotate branch-misses events, notes->src
corresponding to this event is set to null, as a result, when annotate
branch-instructions events, this event is skipped and no annotate is output.

Solution of this patch is to remove zfree in hists__find_annotations and
change sort order to "dso,symbol" to avoid duplicate output when different
processes correspond to the same symbol.

Signed-off-by: Yang Jihong <yangjihong1@xxxxxxxxxx>

Acked-by: Namhyung Kim <namhyung@xxxxxxxxxx>

Without looking at the patch, just at its description of the problem, I
tried to annotate two events in a group, to get the annotate group view
output with both events, and it seems I'm getting samples accounted for
both events:

[root@five ~]# perf record -e '{branch-misses,branch-instructions}' -a sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 2.296 MB perf.data (2507 samples) ]
[root@five ~]#
[root@five ~]# perf report | grep -v '^#' | head -5
17.49% 19.19% ThreadPoolForeg chromium-browser [.] v8::internal::ConcurrentMarking::Run
12.17% 17.04% ThreadPoolForeg chromium-browser [.] v8::internal::Sweeper::RawSweep
11.14% 11.63% ThreadPoolForeg chromium-browser [.] v8::internal::MarkingVisitorBase<v8::internal::ConcurrentMarkingVisitor, v8::internal::ConcurrentMarkingState>::ProcessStrongHeapObject<v8::internal::CompressedHeapObjectSlot>
7.65% 7.84% ThreadPoolForeg chromium-browser [.] v8::internal::ConcurrentMarkingVisitor::ShouldVisit
5.66% 6.72% ThreadPoolForeg chromium-browser [.] v8::internal::ConcurrentMarkingVisitor::VisitPointersInSnapshot

[root@five ~]# perf annotate --stdio2 v8::internal::ConcurrentMarking::Run
Samples: 444 of events 'anon group { branch-misses, branch-instructions }', 4000 Hz, Event count (approx.): 596221, [percent: local period]
v8::internal::ConcurrentMarking::Run() /usr/lib64/chromium-browser/chromium-browser
Percent
Disassembly of section .text:
0000000003290b30 <v8::internal::ConcurrentMarking::Run(v8::JobDelegate*, unsigned int, bool)>:
v8::internal::ConcurrentMarking::Run(v8::JobDelegate*, unsigned int, bool):
push %rbp
mov %rsp,%rbp
push %r15
push %r14
mov %rdi,%r14
push %r13
mov %edx,%r13d
push %r12
mov %ecx,%r12d
push %rbx
sub $0x1298,%rsp
mov %rsi,-0x1228(%rbp)
mov %fs:0x28,%rax
mov %rax,-0x38(%rbp)
<SNIP>
movzwl 0x2(%rbx),%eax
test %ax,%ax
↓ jne 4a9
mov -0x10e8(%rbp),%rdx
cmpw $0x0,0x2(%rdx)
0.41 0.39 ↓ je 4b90
movq %rbx,%xmm0
movq %rdx,%xmm2
mov %rdx,%rbx
punpcklqdq %xmm2,%xmm0
movups %xmm0,-0x10e8(%rbp)
movzwl 0x2(%rdx),%eax
4a9: sub $0x1,%eax
mov %ax,0x2(%rbx)
0.36 0.91 movzwl %ax,%eax
0.60 0.00 mov 0x10(%rbx,%rax,8),%rax
3.44 2.46 mov %rax,-0x11e0(%rbp)
0.00 0.34 4bf: mov 0x8(%r13),%rax
0.00 0.36 add $0x1,%r15d
0.00 0.34 mov 0x110(%rax),%rax
mov 0x128(%rax),%rcx
0.88 0.36 mov 0x8(%r13),%rax
mov 0x110(%rax),%rdx
mov 0x130(%rdx),%rdx
0.00 0.48 mov 0x140(%rax),%rax
mov 0x110(%rax),%rsi
0.61 0.47 mov -0x11e0(%rbp),%rax
2.01 2.32 sub $0x1,%rax
cmp %rcx,%rax
0.00 0.35 setae %cl
1.31 0.33 cmp %rdx,%rax
setb %dl
0.00 0.24 test %dl,%cl
0.00 0.12 ↓ jne 4b70
cmp %rsi,%rax
↓ je 4b70
mov (%rax),%eax
29.10 29.90 add -0x1220(%rbp),%rax
cmpb $0x0,-0x1218(%rbp)
mov %rax,-0x1210(%rbp)
0.00 0.65 ↓ jne 4fa0
mov -0x11e0(%rbp),%r9
lea 0x6(%rax),%rbx
0.38 0.00 545: movzbl (%rbx),%eax
4.90 5.34 cmp $0x4c,%al
↓ ja 5026
0.58 0.00 550: lea v8::internal::FLAGDEFAULT_abort_on_contradictory_flags+0x457,%rdi
movslq (%rdi,%rax,4),%rax
4.97 3.19 add %rdi,%rax
<SNIP>

If I ask for number of samples:

[root@five ~]# perf config annotate.show_nr_samples=true
[root@five ~]# perf annotate --stdio2 v8::internal::ConcurrentMarking::Run

Samples: 444 of events 'anon group { branch-misses, branch-instructions }', 4000 Hz, Event count (approx.): 596221, [percent: local period]
v8::internal::ConcurrentMarking::Run() /usr/lib64/chromium-browser/chromium-browser
Samples
Disassembly of section .text:
0000000003290b30 <v8::internal::ConcurrentMarking::Run(v8::JobDelegate*, unsigned int, bool)>:
v8::internal::ConcurrentMarking::Run(v8::JobDelegate*, unsigned int, bool):
push %rbp
mov %rsp,%rbp
push %r15
push %r14
mov %rdi,%r14
push %r13
mov %edx,%r13d
push %r12
mov %ecx,%r12d
push %rbx
sub $0x1298,%rsp
mov %rsi,-0x1228(%rbp)
mov %fs:0x28,%rax
mov %rax,-0x38(%rbp)
<SNIP>
movzwl 0x2(%rbx),%eax
test %ax,%ax
↓ jne 4a9
mov -0x10e8(%rbp),%rdx
cmpw $0x0,0x2(%rdx)
1 1 ↓ je 4b90
movq %rbx,%xmm0
movq %rdx,%xmm2
mov %rdx,%rbx
punpcklqdq %xmm2,%xmm0
movups %xmm0,-0x10e8(%rbp)
movzwl 0x2(%rdx),%eax
4a9: sub $0x1,%eax
mov %ax,0x2(%rbx)
1 2 movzwl %ax,%eax
1 0 mov 0x10(%rbx,%rax,8),%rax
8 5 mov %rax,-0x11e0(%rbp)
0 1 4bf: mov 0x8(%r13),%rax
0 1 add $0x1,%r15d
0 1 mov 0x110(%rax),%rax
mov 0x128(%rax),%rcx
3 1 mov 0x8(%r13),%rax
mov 0x110(%rax),%rdx
mov 0x130(%rdx),%rdx
0 1 mov 0x140(%rax),%rax
mov 0x110(%rax),%rsi
2 1 mov -0x11e0(%rbp),%rax
6 6 sub $0x1,%rax
cmp %rcx,%rax
0 1 setae %cl
2 1 cmp %rdx,%rax
setb %dl
0 1 test %dl,%cl
0 1 ↓ jne 4b70
cmp %rsi,%rax
↓ je 4b70
mov (%rax),%eax
58 73 add -0x1220(%rbp),%rax
cmpb $0x0,-0x1218(%rbp)
mov %rax,-0x1210(%rbp)
0 1 ↓ jne 4fa0
mov -0x11e0(%rbp),%r9
lea 0x6(%rax),%rbx
1 0 545: movzbl (%rbx),%eax
10 13 cmp $0x4c,%al
↓ ja 5026
1 0 550: lea v8::internal::FLAGDEFAULT_abort_on_contradictory_flags+0x457,%rdi
movslq (%rdi,%rax,4),%rax
9 8 add %rdi,%rax

<SNIP>

So it seems to be working, what am I missing? Is this strictly non
group related?

Yes, it is non group related.
This problem occurs only when different events need to be recorded at the same time, i.e.:
perf record -e branch-misses -e branch-instructions -a sleep 1

The output results of perf script and perf annotate do not match.
Some events are not output in perf annotate.

- Arnaldo
.