RE: [patch] x86, perf_counter, bts: add bts to perf_counter

From: Metzger, Markus T
Date: Fri Aug 07 2009 - 08:14:55 EST


>-----Original Message-----
>From: Ingo Molnar [mailto:mingo@xxxxxxx]
>Sent: Friday, August 07, 2009 1:30 PM
>To: Metzger, Markus T
>Cc: Peter Zijlstra; tglx@xxxxxxxxxxxxx; hpa@xxxxxxxxx; markus.t.metzger@xxxxxxxxx; linux-
>kernel@xxxxxxxxxxxxxxx
>Subject: Re: [patch] x86, perf_counter, bts: add bts to perf_counter
>
>
>* Ingo Molnar <mingo@xxxxxxx> wrote:
>
>> btw., the number of samples seems to be varying too widely:
>>
>> titan:~> for ((i=0;i<10;i++)); do perf record -f -e branches:u -c 1 true 2>/dev/null; perf report |
>head -1; done
>> # Samples: 28784
>> # Samples: 24063
>> # Samples: 22788
>> # Samples: 30449
>> # Samples: 15335
>> # Samples: 30557
>> # Samples: 24010
>> # Samples: 23866
>> # Samples: 24877
>> # Samples: 24330
>>
>> compared to the branch-stat itself:
>>
>> titan:~> perf stat -v --repeat 10 -e branches:u true
>> [ perf stat: executing run #1 ... ]
>> [ perf stat: executing run #2 ... ]
>> [ perf stat: executing run #3 ... ]
>> [ perf stat: executing run #4 ... ]
>> [ perf stat: executing run #5 ... ]
>> [ perf stat: executing run #6 ... ]
>> [ perf stat: executing run #7 ... ]
>> [ perf stat: executing run #8 ... ]
>> [ perf stat: executing run #9 ... ]
>> [ perf stat: executing run #10 ... ]
>>
>> Performance counter stats for 'true' (10 runs):
>>
>> 23851 branches ( +- 0.000% )
>>
>> 0.000639653 seconds time elapsed ( +- 2.474% )
>>
>> do we lose records in the recording?
>
>i doubt it's lost records. Even with SCHED_FIFO sampling and with a
>huge, 512 MB mmap ring-buffer we get a BTS sample count variation in
>the +- 10% range:
>
>titan:/home/mingo> for ((i=0;i<10;i++)); do perf record -r 1 -m
>131072 -f -e branches:u -c 1 true 2>/dev/null; perf report | head
>-1; done
># Samples: 24860
># Samples: 24177
># Samples: 26165
># Samples: 25682
># Samples: 29175
># Samples: 23136
># Samples: 27102
># Samples: 29888
># Samples: 25524
># Samples: 24266
>
>(and these are all just user-mode executions)


Hmmm, we do get a single branch record for the transition from kernel to user mode.
We should therefore expect some deviation, but 10% sounds too much to me.

Perf record traces itself and then exec's the application you want to trace.
So, you would get a mix of perf trace and actual application trace. That could explain
a 10% deviation.

Perf stat, on the other hand, seems to go through some effort to only trace the
actual application.

What do you think?

In my tests, I'm looking for deterministic trace, e.g. on-the-spot branches for
for(;;) {} loops or a branch to 0 for a null function pointer call. I pretty much
ignored the 'noise' around that.


regards,
markus.

---------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/