Re: [GIT PULL 00/11] perf/core improvements and fixes

From: Jiri Olsa
Date: Thu Mar 03 2016 - 04:15:41 EST


On Thu, Mar 03, 2016 at 09:21:30AM +0100, Ingo Molnar wrote:
>
> * Arnaldo Carvalho de Melo <acme@xxxxxxxxxx> wrote:
>
> > Hi Ingo,
> >
> > Please consider pulling,
> >
> > - Arnaldo
> >
> > The following changes since commit 675965b00d734c985e4285f5bec7e524d15fc4e1:
> >
> > perf: Export perf_event_sysfs_show() (2016-02-29 09:35:27 +0100)
> >
> > are available in the git repository at:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-20160229
> >
> > for you to fetch changes up to 575a02e00b11eecbbabcb1eb22eab4c68e91ae77:
> >
> > perf record: Ensure return non-zero rc when mmap fail (2016-02-29 12:44:15 -0300)
> >
> > ----------------------------------------------------------------
> > perf/core improvements and fixes:
> >
> > User visible:
> >
> > - Check existence of frontend/backed stalled cycles in 'perf stat' (Andi Kleen)
> >
> > - Avoid installing .o files from tools/lib/ into the python extension (Jiri Olsa)
> >
> > - Rename the tracepoint '/format' field that carries the syscall ID from 'nr',
> > that is also the name of some syscalls arguments, to "__syscall_nr", to
> > avoid having multiple fields with the same name, that was breaking the
> > python script skeleton generator from perf.data files (Taeung Song)
> >
> > - Support converting data from bpf events in 'perf data' (Wang Nan)
> >
> > Infrastructure:
> >
> > - Split libtraceevent's pevent_print_event() (Steven Rostedt)
> >
> > - Librarize some 'perf record' bits to allow handling multiple perf.data
> > files per session (Wang Nan)
> >
> > - Ensure return non-zero rc when mmap fail in 'perf record' (Wang Nan)
> >
> > Signed-off-by: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx>
> >
> > ----------------------------------------------------------------
> > Andi Kleen (1):
> > perf stat: Check existence of frontend/backed stalled cycles
>
> >
> > Jiri Olsa (1):
> > perf tools: Fix python extension build
> >
> > Steven Rostedt (1):
> > tools lib traceevent: Split pevent_print_event() into specific functionality functions
> >
> > Taeung Song (2):
> > perf trace: Check and discard not only 'nr' but also '__syscall_nr'
> > tracing/syscalls: Rename "/format" tracepoint field name "nr" to "__syscall_nr:
> >
> > Wang Nan (6):
> > perf data: Support converting data from bpf_perf_event_output()
> > perf data: Explicitly set byte order for integer types
> > perf record: Use WARN_ONCE to replace 'if' condition
> > perf record: Extract synthesize code to record__synthesize()
> > perf record: Introduce record__finish_output() to finish a perf.data
> > perf record: Ensure return non-zero rc when mmap fail
> >
> > kernel/trace/trace_syscalls.c | 16 ++--
> > tools/lib/traceevent/event-parse.c | 136 +++++++++++++++++++++++-------
> > tools/lib/traceevent/event-parse.h | 13 +++
> > tools/perf/builtin-record.c | 168 ++++++++++++++++++++++---------------
> > tools/perf/builtin-stat.c | 22 ++++-
> > tools/perf/builtin-trace.c | 8 +-
> > tools/perf/util/data-convert-bt.c | 118 +++++++++++++++++++++++++-
> > tools/perf/util/setup.py | 4 +
> > 8 files changed, 372 insertions(+), 113 deletions(-)
>
> Hm, there's a 'perf stat' regression that I can see:
>
> Before:
>
> triton:~/tip> perf stat -a sleep 1
>
> Performance counter stats for 'system wide':
>
> 11990.023100 task-clock (msec) # 11.981 CPUs utilized
> 8,802 context-switches # 0.734 K/sec
> 543 cpu-migrations # 0.045 K/sec
> 97,375 page-faults # 0.008 M/sec
> 9,854,385,894 cycles # 0.822 GHz
> 15,274,841,152 stalled-cycles-frontend # 155.01% frontend cycles idle
> <not supported> stalled-cycles-backend
> 9,634,486,137 instructions # 0.98 insn per cycle
> # 1.59 stalled cycles per insn
> 1,818,488,088 branches # 151.667 M/sec
> 46,365,120 branch-misses # 2.55% of all branches
>
> 1.000741599 seconds time elapsed
>
> After:
>
> triton:~/tip> perf stat -a sleep 1
>
> Performance counter stats for 'system wide':
>
> 11989.280397 task-clock (msec) # 11.981 CPUs utilized
> 1299 context-switches # 0.108 K/sec
> 6 cpu-migrations # 0.001 K/sec
> 70 page-faults # 0.006 K/sec
> 127008602 cycles # 0.011 GHz
> 279538533 stalled-cycles-frontend # 220.09% frontend cycles idle
> 119213269 instructions # 0.94 insn per cycle
> # 2.34 stalled cycles per insn
> 24166678 branches # 2.016 M/sec
> 505681 branch-misses # 2.09% of all branches
>
> 1.000684278 seconds time elapsed
>
>
> ... see how the numbers became human-unreadable, losing the big-number separator?
>
> I suspect it's due to the following commit:
>
> fa184776ac27 perf stat: Check existence of frontend/backed stalled cycles

yea, it used the pmu parsing which screwes locales,
following patch fixed that for me..

jirka


---
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index ce61f79dbaae..d8cd038baed2 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -124,6 +124,17 @@ static int perf_pmu__parse_scale(struct perf_pmu_alias *alias, char *dir, char *
lc = setlocale(LC_NUMERIC, NULL);

/*
+ * The lc string may be allocated in static storage,
+ * so get a dynamic copy to make it survive setlocale
+ * call below.
+ */
+ lc = strdup(lc);
+ if (!lc) {
+ ret = -ENOMEM;
+ goto error;
+ }
+
+ /*
* force to C locale to ensure kernel
* scale string is converted correctly.
* kernel uses default C locale.
@@ -135,6 +146,8 @@ static int perf_pmu__parse_scale(struct perf_pmu_alias *alias, char *dir, char *
/* restore locale */
setlocale(LC_NUMERIC, lc);

+ free((char *) lc);
+
ret = 0;
error:
close(fd);