Re: [PATCH 4/4] perf_counter: update perf-top to use the new freqinterface

From: Ingo Molnar
Date: Fri May 15 2009 - 09:33:40 EST



* Peter Zijlstra <a.p.zijlstra@xxxxxxxxx> wrote:

> Provide perf top -F as alternative to -c.
>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
> CC: Paul Mackerras <paulus@xxxxxxxxx>
> CC: Corey Ashford <cjashfor@xxxxxxxxxxxxxxxxxx>
> ---
> Documentation/perf_counter/builtin-top.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> Index: linux-2.6/Documentation/perf_counter/builtin-top.c
> ===================================================================
> --- linux-2.6.orig/Documentation/perf_counter/builtin-top.c
> +++ linux-2.6/Documentation/perf_counter/builtin-top.c
> @@ -98,6 +98,7 @@ static unsigned int page_size;
> static unsigned int mmap_pages = 16;
> static int use_mmap = 0;
> static int use_munmap = 0;
> +static int freq = 0;
>
> static char *vmlinux;
>
> @@ -846,9 +847,10 @@ static void process_options(int argc, ch
> {"stat", no_argument, NULL, 'S'},
> {"vmlinux", required_argument, NULL, 'x'},
> {"zero", no_argument, NULL, 'z'},
> + {"freq", required_argument, NULL, 'F'},
> {NULL, 0, NULL, 0 }
> };
> - int c = getopt_long(argc, argv, "+:ac:C:d:De:f:g:hln:m:p:r:s:Sx:zMU",
> + int c = getopt_long(argc, argv, "+:ac:C:d:De:f:g:hln:m:p:r:s:Sx:zMUF:",
> long_options, &option_index);
> if (c == -1)
> break;
> @@ -889,6 +891,7 @@ static void process_options(int argc, ch
> case 'm': mmap_pages = atoi(optarg); break;
> case 'M': use_mmap = 1; break;
> case 'U': use_munmap = 1; break;
> + case 'F': freq = 1; default_interval = atoi(optarg); break;
> default: error = 1; break;
> }
> }
> @@ -1075,6 +1078,7 @@ int cmd_top(int argc, char **argv, const
> hw_event.nmi = nmi;
> hw_event.mmap = use_mmap;
> hw_event.munmap = use_munmap;
> + hw_event.freq = freq;
>
> fd[i][counter] = sys_perf_counter_open(&hw_event, tid, cpu, group_fd, 0);
> if (fd[i][counter] < 0) {

this frequency-based profiling is nice. It's a lot more untuitive to
users than rigid defaults of 'one IRQ per 100,000 cycles'.

So i think perf-top should be changed to have -F enabled by default,
with a default 10 KHz frequency for all counters.

But for that we need another fix for this: currently the histogram
is 'number of interrupts' based, which gets skewed with frequency
based profiling.

A correct sorting key would be a normalized histogram, along 'number
of hardware events', which could be measured as deltas between
interrupts, like this:

counter_val: 1200000 [ IRQ ] -> { 1200000, RIP-1 }
.
.
.
counter_val: 1250000 [ IRQ ] -> { 1250000, RIP-2 }
.
.
.
counter_val: 1260000 [ IRQ ] -> { 1260000, RIP-3 }

look at how the delta between the first and the second IRQ was 50000
cycles, while the delta between the second and third IRQ was just
10000 cycles - because the frequency adjustment code shortened the
period.

So in the histogram, RIP-2 should get 50,000 cycles, and RIP-3
should get 10,000 cycles.

With the current scheme both would get +1 event credited - which is
wrong.

Agreed?

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/