Re: [PATCH 2/2 -tip] perf_counter: parse-events.c introduce aliasmember in event_symbol

From: Jaswinder Singh Rajput
Date: Mon Jun 22 2009 - 16:09:36 EST


On Tue, 2009-06-23 at 01:25 +0530, Jaswinder Singh Rajput wrote:
> On Mon, 2009-06-22 at 16:10 +0200, Ingo Molnar wrote:
> > yeah, somethig like that. I'd suggest to print out the actual
> > measured events:
> >
> > cache-references 10123 events
> > cache-misses 15 events
> >
> > and if something does not appear to be ticking then do something
> > like:
> >
> > cache-misses <inactive>
> >
> > I.e. 'perf test' could be a quick way both to users and to
> > developers to see all possible hw and sw events.
> >
> > Perhaps builtin-test.c should also do specific testcases for certain
> > counters - say intentionally migrate to a CPU and back to see the
> > CPU-migration count.
> >
> > Also, you seem to have copied builtin-stat.c, right? Try to
> > librarize as much of the functionality (into util/*) to make the
> > resulting linecount increase as small as possible.
> >
>
> perf test also need some command to execute otherwise it will also show
> long list of <inactive>
>
> I think better I should support all events in perf stat so user can get
> better information from it and we can all add some other testing option
> to it.
>
> Anyway currently it looks like this :
>
> [RFC][PATCH] perf_counter tools: introduce perf test to test event for ticks

This fixes some style issues :

[RFC][PATCH] perf_counter tools: introduce perf test to test event for ticks

perf test to Test performance counter events, its output on AMD box :

./perf test -a -- ls -lR > /dev/null

Performance counter stats for 'ls' -lR:

cycles 1226819954
instructions 283680441
cache-references 144893559
cache-misses 3268438
branches 37488241
branch-misses 2464027
bus-cycles <inactive>
cpu-clock-msecs 17175506056
task-clock-msecs 17175086665
page-faults 488
minor-faults 488
major-faults <inactive>
context-switches 7956
CPU-migrations 7
L1-data-Cache-Load-Referencees 398303881
L1-data-Cache-Load-Misses 3552374
L1-data-Cache-Store-Referencees 270178
L1-data-Cache-Store-Misses <inactive>
L1-data-Cache-Prefetch-Referencees 611622
L1-data-Cache-Prefetch-Misses 399730
L1-instruction-Cache-Load-Referencees 124696447
L1-instruction-Cache-Load-Misses 2912802
L1-instruction-Cache-Store-Referencees <inactive>
L1-instruction-Cache-Store-Misses <inactive>
L1-instruction-Cache-Prefetch-Referencees 156576
L1-instruction-Cache-Prefetch-Misses <inactive>
L2-Cache-Load-Referencees 4312353
L2-Cache-Load-Misses 470382
L2-Cache-Store-Referencees 4392945
L2-Cache-Store-Misses <inactive>
L2-Cache-Prefetch-Referencees <inactive>
L2-Cache-Prefetch-Misses <inactive>
Data-TLB-Cache-Load-Referencees 127076487
Data-TLB-Cache-Load-Misses 1930048
Data-TLB-Cache-Store-Referencees <inactive>
Data-TLB-Cache-Store-Misses <inactive>
Data-TLB-Cache-Prefetch-Referencees <inactive>
Data-TLB-Cache-Prefetch-Misses <inactive>
Instruction-TLB-Cache-Load-Referencees 132768077
Instruction-TLB-Cache-Load-Misses 6406
Instruction-TLB-Cache-Store-Referencees <inactive>
Instruction-TLB-Cache-Store-Misses <inactive>
Instruction-TLB-Cache-Prefetch-Referencees <inactive>
Instruction-TLB-Cache-Prefetch-Misses <inactive>
Branch-Cache-Load-Referencees 58030210
Branch-Cache-Load-Misses 3257804
Branch-Cache-Store-Referencees <inactive>
Branch-Cache-Store-Misses <inactive>
Branch-Cache-Prefetch-Referencees <inactive>
Branch-Cache-Prefetch-Misses <inactive>

8.681671511 seconds time elapsed.

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@xxxxxxxxx>
---
tools/perf/Documentation/perf-test.txt | 44 ++++
tools/perf/Makefile | 1 +
tools/perf/builtin-test.c | 436 ++++++++++++++++++++++++++++++++
tools/perf/builtin.h | 1 +
tools/perf/command-list.txt | 1 +
tools/perf/perf.c | 1 +
6 files changed, 484 insertions(+), 0 deletions(-)
create mode 100644 tools/perf/Documentation/perf-test.txt
create mode 100644 tools/perf/builtin-test.c

diff --git a/tools/perf/Documentation/perf-test.txt b/tools/perf/Documentation/perf-test.txt
new file mode 100644
index 0000000..6233769
--- /dev/null
+++ b/tools/perf/Documentation/perf-test.txt
@@ -0,0 +1,44 @@
+perf-test(1)
+============
+
+NAME
+----
+perf-test - Run a command and gather performance counter event count if any
+
+SYNOPSIS
+--------
+[verse]
+'perf test' [-e <EVENT> | --event=EVENT] [-a] <command>
+'perf test' [-e <EVENT> | --event=EVENT] [-a] -- <command> [<options>]
+
+DESCRIPTION
+-----------
+This command runs a command and gathers performance counter event count
+from it.
+
+
+OPTIONS
+-------
+<command>...::
+ Any command you can specify in a shell.
+
+
+-e::
+--event=::
+ Select the PMU event. Selection can be a symbolic event name
+ (use 'perf list' to list all events) or a raw PMU
+ event (eventsel+umask) in the form of rNNN where NNN is a
+ hexadecimal event descriptor.
+
+-a::
+ system-wide collection
+
+EXAMPLES
+--------
+
+$ perf test -- make -j
+
+
+SEE ALSO
+--------
+linkperf:perf-stat[1], perf-top[1], linkperf:perf-list[1]
diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index 36d7eef..f5ac83f 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -335,6 +335,7 @@ BUILTIN_OBJS += builtin-list.o
BUILTIN_OBJS += builtin-record.o
BUILTIN_OBJS += builtin-report.o
BUILTIN_OBJS += builtin-stat.o
+BUILTIN_OBJS += builtin-test.o
BUILTIN_OBJS += builtin-top.o

PERFLIBS = $(LIB_FILE)
diff --git a/tools/perf/builtin-test.c b/tools/perf/builtin-test.c
new file mode 100644
index 0000000..3b24b2d
--- /dev/null
+++ b/tools/perf/builtin-test.c
@@ -0,0 +1,436 @@
+/*
+ * builtin-test.c
+ *
+ * Builtin test command: Test performace counter events
+ *
+ * Sample output on AMD box:
+
+ $ perf test -a -- ls -lR > /dev/null
+
+ Performance counter stats for 'ls' -lR:
+
+ cycles 1226819954
+ instructions 283680441
+ cache-references 144893559
+ cache-misses 3268438
+ branches 37488241
+ branch-misses 2464027
+ bus-cycles <inactive>
+ cpu-clock-msecs 17175506056
+ task-clock-msecs 17175086665
+ page-faults 488
+ minor-faults 488
+ major-faults <inactive>
+ context-switches 7956
+ CPU-migrations 7
+ L1-data-Cache-Load-Referencees 398303881
+ L1-data-Cache-Load-Misses 3552374
+ L1-data-Cache-Store-Referencees 270178
+ L1-data-Cache-Store-Misses <inactive>
+ L1-data-Cache-Prefetch-Referencees 611622
+ L1-data-Cache-Prefetch-Misses 399730
+ L1-instruction-Cache-Load-Referencees 124696447
+ L1-instruction-Cache-Load-Misses 2912802
+ L1-instruction-Cache-Store-Referencees <inactive>
+ L1-instruction-Cache-Store-Misses <inactive>
+ L1-instruction-Cache-Prefetch-Referencees 156576
+ L1-instruction-Cache-Prefetch-Misses <inactive>
+ L2-Cache-Load-Referencees 4312353
+ L2-Cache-Load-Misses 470382
+ L2-Cache-Store-Referencees 4392945
+ L2-Cache-Store-Misses <inactive>
+ L2-Cache-Prefetch-Referencees <inactive>
+ L2-Cache-Prefetch-Misses <inactive>
+ Data-TLB-Cache-Load-Referencees 127076487
+ Data-TLB-Cache-Load-Misses 1930048
+ Data-TLB-Cache-Store-Referencees <inactive>
+ Data-TLB-Cache-Store-Misses <inactive>
+ Data-TLB-Cache-Prefetch-Referencees <inactive>
+ Data-TLB-Cache-Prefetch-Misses <inactive>
+ Instruction-TLB-Cache-Load-Referencees 132768077
+ Instruction-TLB-Cache-Load-Misses 6406
+ Instruction-TLB-Cache-Store-Referencees <inactive>
+ Instruction-TLB-Cache-Store-Misses <inactive>
+ Instruction-TLB-Cache-Prefetch-Referencees <inactive>
+ Instruction-TLB-Cache-Prefetch-Misses <inactive>
+ Branch-Cache-Load-Referencees 58030210
+ Branch-Cache-Load-Misses 3257804
+ Branch-Cache-Store-Referencees <inactive>
+ Branch-Cache-Store-Misses <inactive>
+ Branch-Cache-Prefetch-Referencees <inactive>
+ Branch-Cache-Prefetch-Misses <inactive>
+
+ 8.681671511 seconds time elapsed.
+
+ * (based on builtin-stat.c)
+ *
+ * Copyright (C) 2008, Red Hat Inc, Ingo Molnar <mingo@xxxxxxxxxx>
+ * Copyright (C) 2009, Jaswinder Singh Rajput <jaswinder@xxxxxxxxxx>
+ *
+ * Released under the GPL v2. (and only v2, not any later version)
+ */
+
+#include "perf.h"
+#include "builtin.h"
+#include "util/util.h"
+#include "util/parse-options.h"
+#include "util/parse-events.h"
+
+#include <sys/prctl.h>
+#include <math.h>
+
+#define CHW(x) .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_##x
+#define CSW(x) .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_##x
+#define CHCACHE(x, y, z) \
+.type = PERF_TYPE_HW_CACHE, \
+.config = (PERF_COUNT_HW_CACHE_##x | (PERF_COUNT_HW_CACHE_OP_##y << 8) |\
+ (PERF_COUNT_HW_CACHE_RESULT_##z << 16))
+
+static struct perf_counter_attr default_attrs[] = {
+/* Generalized Hardware events */
+ { CHW(CPU_CYCLES) },
+ { CHW(INSTRUCTIONS) },
+ { CHW(CACHE_REFERENCES) },
+ { CHW(CACHE_MISSES) },
+ { CHW(BRANCH_INSTRUCTIONS) },
+ { CHW(BRANCH_MISSES) },
+ { CHW(BUS_CYCLES) },
+
+/* Generalized Software events */
+ { CSW(CPU_CLOCK) },
+ { CSW(TASK_CLOCK) },
+ { CSW(PAGE_FAULTS) },
+ { CSW(PAGE_FAULTS_MIN) },
+ { CSW(PAGE_FAULTS_MAJ) },
+ { CSW(CONTEXT_SWITCHES) },
+ { CSW(CPU_MIGRATIONS) },
+
+/* Generalized Hardware cache counters events */
+ { CHCACHE(L1D, READ, ACCESS) },
+ { CHCACHE(L1D, READ, MISS) },
+ { CHCACHE(L1D, WRITE, ACCESS) },
+ { CHCACHE(L1D, WRITE, MISS) },
+ { CHCACHE(L1D, PREFETCH, ACCESS) },
+ { CHCACHE(L1D, PREFETCH, MISS) },
+
+ { CHCACHE(L1I, READ, ACCESS) },
+ { CHCACHE(L1I, READ, MISS) },
+ { CHCACHE(L1I, WRITE, ACCESS) },
+ { CHCACHE(L1I, WRITE, MISS) },
+ { CHCACHE(L1I, PREFETCH, ACCESS) },
+ { CHCACHE(L1I, PREFETCH, MISS) },
+
+ { CHCACHE(LL, READ, ACCESS) },
+ { CHCACHE(LL, READ, MISS) },
+ { CHCACHE(LL, WRITE, ACCESS) },
+ { CHCACHE(LL, WRITE, MISS) },
+ { CHCACHE(LL, PREFETCH, ACCESS) },
+ { CHCACHE(LL, PREFETCH, MISS) },
+
+ { CHCACHE(DTLB, READ, ACCESS) },
+ { CHCACHE(DTLB, READ, MISS) },
+ { CHCACHE(DTLB, WRITE, ACCESS) },
+ { CHCACHE(DTLB, WRITE, MISS) },
+ { CHCACHE(DTLB, PREFETCH, ACCESS) },
+ { CHCACHE(DTLB, PREFETCH, MISS) },
+
+ { CHCACHE(ITLB, READ, ACCESS) },
+ { CHCACHE(ITLB, READ, MISS) },
+ { CHCACHE(ITLB, WRITE, ACCESS) },
+ { CHCACHE(ITLB, WRITE, MISS) },
+ { CHCACHE(ITLB, PREFETCH, ACCESS) },
+ { CHCACHE(ITLB, PREFETCH, MISS) },
+
+ { CHCACHE(BPU, READ, ACCESS) },
+ { CHCACHE(BPU, READ, MISS) },
+ { CHCACHE(BPU, WRITE, ACCESS) },
+ { CHCACHE(BPU, WRITE, MISS) },
+ { CHCACHE(BPU, PREFETCH, ACCESS) },
+ { CHCACHE(BPU, PREFETCH, MISS) },
+
+};
+
+#define MAX_RUN 100
+
+static int system_wide = 0;
+static int verbose = 0;
+
+static int nr_cpus = 0;
+
+static int run_count = 1;
+static int run_idx = 0;
+
+static unsigned int page_size;
+
+static int fd[MAX_NR_CPUS][MAX_COUNTERS];
+
+static u64 event_res[MAX_RUN][MAX_COUNTERS][3];
+
+static u64 walltime_nsecs[MAX_RUN];
+static u64 runtime_cycles[MAX_RUN];
+
+static u64 event_res_avg[MAX_COUNTERS][3];
+
+static u64 walltime_nsecs_avg;
+
+static u64 runtime_cycles_avg;
+
+static void create_perf_stat_counter(int counter)
+{
+ struct perf_counter_attr *attr = attrs + counter;
+
+ if (system_wide) {
+ int cpu;
+ for (cpu = 0; cpu < nr_cpus; cpu++) {
+ fd[cpu][counter] = sys_perf_counter_open(attr, -1, cpu, -1, 0);
+ if (fd[cpu][counter] < 0 && verbose) {
+ printf("Error: counter %d, sys_perf_counter_open() syscall returned with %d (%s)\n", counter, fd[cpu][counter], strerror(errno));
+ }
+ }
+ } else {
+ attr->disabled = 1;
+
+ fd[0][counter] = sys_perf_counter_open(attr, 0, -1, -1, 0);
+ if (fd[0][counter] < 0 && verbose) {
+ printf("Error: counter %d, sys_perf_counter_open() syscall returned with %d (%s)\n", counter, fd[0][counter], strerror(errno));
+ }
+ }
+}
+
+/*
+ * Read out the results of a single counter:
+ */
+static void read_counter(int counter)
+{
+ u64 *count, single_count[3];
+ ssize_t res;
+ int cpu, nv;
+
+ count = event_res[run_idx][counter];
+
+ count[0] = count[1] = count[2] = 0;
+
+ nv = 1;
+ for (cpu = 0; cpu < nr_cpus; cpu++) {
+ if (fd[cpu][counter] < 0)
+ continue;
+
+ res = read(fd[cpu][counter], single_count, nv * sizeof(u64));
+ assert(res == nv * sizeof(u64));
+ close(fd[cpu][counter]);
+ fd[cpu][counter] = -1;
+
+ count[0] += single_count[0];
+ }
+
+ /*
+ * Save the full runtime - to allow normalization during printout:
+ */
+ runtime_cycles[run_idx] = count[0];
+}
+
+static int run_perf_test(int argc, const char **argv)
+{
+ unsigned long long t0, t1;
+ int status = 0;
+ int counter;
+ int pid;
+
+ if (!system_wide)
+ nr_cpus = 1;
+
+ for (counter = 0; counter < nr_counters; counter++)
+ create_perf_stat_counter(counter);
+
+ /*
+ * Enable counters and exec the command:
+ */
+ t0 = rdclock();
+ prctl(PR_TASK_PERF_COUNTERS_ENABLE);
+
+ if ((pid = fork()) < 0)
+ perror("failed to fork");
+
+ if (!pid) {
+ if (execvp(argv[0], (char **)argv)) {
+ perror(argv[0]);
+ exit(-1);
+ }
+ }
+
+ wait(&status);
+
+ prctl(PR_TASK_PERF_COUNTERS_DISABLE);
+ t1 = rdclock();
+
+ walltime_nsecs[run_idx] = t1 - t0;
+
+ for (counter = 0; counter < nr_counters; counter++)
+ read_counter(counter);
+
+ return WEXITSTATUS(status);
+}
+
+static void test_printout(int counter, u64 *count)
+{
+ fprintf(stderr, " %-45s", event_name(counter));
+
+ if (count[0])
+ fprintf(stderr, " %14Ld", count[0]);
+ else
+ fprintf(stderr, " <inactive>");
+}
+
+/*
+ * Print out the results of a single counter:
+ */
+static void print_counter(int counter)
+{
+ u64 *count;
+
+ count = event_res_avg[counter];
+
+ test_printout(counter, count);
+
+ fprintf(stderr, "\n");
+}
+
+static void update_avg(const char *name, int idx, u64 *avg, u64 *val)
+{
+ *avg += *val;
+
+ if (verbose > 1)
+ fprintf(stderr, "debug: %20s[%d]: %Ld\n", name, idx, *val);
+}
+/*
+ * Calculate the averages:
+ */
+static void calc_avg(void)
+{
+ int i, j;
+
+ if (verbose > 1)
+ fprintf(stderr, "\n");
+
+ for (i = 0; i < run_count; i++) {
+ update_avg("walltime", 0, &walltime_nsecs_avg, walltime_nsecs + i);
+ update_avg("runtime_cycles", 0, &runtime_cycles_avg, runtime_cycles + i);
+ for (j = 0; j < nr_counters; j++) {
+ update_avg("counter/0", j,
+ event_res_avg[j]+0, event_res[i][j]+0);
+ update_avg("counter/1", j,
+ event_res_avg[j]+1, event_res[i][j]+1);
+ update_avg("counter/2", j,
+ event_res_avg[j]+2, event_res[i][j]+2);
+ }
+ }
+ walltime_nsecs_avg /= run_count;
+ runtime_cycles_avg /= run_count;
+
+ for (j = 0; j < nr_counters; j++) {
+ event_res_avg[j][0] /= run_count;
+ event_res_avg[j][1] /= run_count;
+ event_res_avg[j][2] /= run_count;
+ }
+}
+
+static void print_test(int argc, const char **argv)
+{
+ int i, counter;
+
+ calc_avg();
+
+ fflush(stdout);
+
+ fprintf(stderr, "\n");
+ fprintf(stderr, " Performance counter stats for \'%s\'", argv[0]);
+
+ for (i = 1; i < argc; i++)
+ fprintf(stderr, " %s", argv[i]);
+
+ fprintf(stderr, ":\n\n");
+
+ for (counter = 0; counter < nr_counters; counter++)
+ print_counter(counter);
+
+ fprintf(stderr, "\n");
+ fprintf(stderr, " %14.9f seconds time elapsed.\n",
+ (double)walltime_nsecs_avg/1e9);
+ fprintf(stderr, "\n");
+}
+
+static volatile int signr = -1;
+
+static void skip_signal(int signo)
+{
+ signr = signo;
+}
+
+static const char * const test_usage[] = {
+ "perf test [<options>] <command>",
+ NULL
+};
+
+static void sig_atexit(void)
+{
+ if (signr == -1)
+ return;
+
+ signal(signr, SIG_DFL);
+ kill(getpid(), signr);
+}
+
+static const struct option options[] = {
+ OPT_CALLBACK('e', "event", NULL, "event",
+ "event selector. use 'perf list' to list available events",
+ parse_events),
+ OPT_BOOLEAN('a', "all-cpus", &system_wide,
+ "system-wide collection from all CPUs"),
+ OPT_BOOLEAN('v', "verbose", &verbose,
+ "be more verbose (show counter open errors, etc)"),
+ OPT_END()
+};
+
+int cmd_test(int argc, const char **argv, const char *prefix)
+{
+ int status;
+
+ page_size = sysconf(_SC_PAGE_SIZE);
+
+ memcpy(attrs, default_attrs, sizeof(attrs));
+
+ argc = parse_options(argc, argv, options, test_usage, 0);
+ if (!argc)
+ usage_with_options(test_usage, options);
+ if (run_count <= 0 || run_count > MAX_RUN)
+ usage_with_options(test_usage, options);
+
+ if (!nr_counters)
+ nr_counters = ARRAY_SIZE(default_attrs);
+
+ nr_cpus = sysconf(_SC_NPROCESSORS_ONLN);
+ assert(nr_cpus <= MAX_NR_CPUS);
+ assert(nr_cpus >= 0);
+
+ /*
+ * We dont want to block the signals - that would cause
+ * child tasks to inherit that and Ctrl-C would not work.
+ * What we want is for Ctrl-C to work in the exec()-ed
+ * task, but being ignored by perf test itself:
+ */
+ atexit(sig_atexit);
+ signal(SIGINT, skip_signal);
+ signal(SIGALRM, skip_signal);
+ signal(SIGABRT, skip_signal);
+
+ status = 0;
+ for (run_idx = 0; run_idx < run_count; run_idx++) {
+ if (run_count != 1 && verbose)
+ fprintf(stderr, "[ perf test: executing run #%d ... ]\n", run_idx+1);
+ status = run_perf_test(argc, argv);
+ }
+
+ print_test(argc, argv);
+
+ return status;
+}
diff --git a/tools/perf/builtin.h b/tools/perf/builtin.h
index 51d1682..3ed0362 100644
--- a/tools/perf/builtin.h
+++ b/tools/perf/builtin.h
@@ -22,5 +22,6 @@ extern int cmd_stat(int argc, const char **argv, const char *prefix);
extern int cmd_top(int argc, const char **argv, const char *prefix);
extern int cmd_version(int argc, const char **argv, const char *prefix);
extern int cmd_list(int argc, const char **argv, const char *prefix);
+extern int cmd_test(int argc, const char **argv, const char *prefix);

#endif
diff --git a/tools/perf/command-list.txt b/tools/perf/command-list.txt
index eebce30..f53544c 100644
--- a/tools/perf/command-list.txt
+++ b/tools/perf/command-list.txt
@@ -7,4 +7,5 @@ perf-list mainporcelain common
perf-record mainporcelain common
perf-report mainporcelain common
perf-stat mainporcelain common
+perf-test mainporcelain common
perf-top mainporcelain common
diff --git a/tools/perf/perf.c b/tools/perf/perf.c
index 4eb7259..9f98f5e 100644
--- a/tools/perf/perf.c
+++ b/tools/perf/perf.c
@@ -262,6 +262,7 @@ static void handle_internal_command(int argc, const char **argv)
{ "record", cmd_record, 0 },
{ "report", cmd_report, 0 },
{ "stat", cmd_stat, 0 },
+ { "test", cmd_test, 0 },
{ "top", cmd_top, 0 },
{ "annotate", cmd_annotate, 0 },
{ "version", cmd_version, 0 },
--
1.6.0.6



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/