Re: [generalized cache events] Re: [PATCH 1/1] perf tools: Addmissing user space support for config1/config2

From: arun
Date: Fri Apr 22 2011 - 12:59:22 EST


On Fri, Apr 22, 2011 at 12:52:11PM +0200, Ingo Molnar wrote:
>
> Using the generalized cache events i can run:
>
> $ perf stat --repeat 10 -e cycles:u -e instructions:u -e l1-dcache-loads:u -e l1-dcache-load-misses:u ./array
>
> Performance counter stats for './array' (10 runs):
>
> 6,719,130 cycles:u ( +- 0.662% )
> 5,084,792 instructions:u # 0.757 IPC ( +- 0.000% )
> 1,037,032 l1-dcache-loads:u ( +- 0.009% )
> 1,003,604 l1-dcache-load-misses:u ( +- 0.003% )
>
> 0.003802098 seconds time elapsed ( +- 13.395% )
>
> I consider that this is 'bad', because for almost every dcache-load there's a
> dcache-miss - a 99% L1 cache miss rate!

One could argue that all you need is cycles and instructions. If there is an
expensive load, you'll see that the load instruction takes many cycles and
you can infer that it's a cache miss.

Questions app developers typically ask me:

* If I fix all my top 5 L3 misses how much faster will my app go?
* Am I bottlenecked on memory bandwidth?
* I have 4 L3 misses every 1000 instructions and 15 branch mispredicts per
1000 instructions. Which one should I focus on?

It's hard to answer some of these without access to all events.
While your approach of having generic events for commonly used counters
might be useful for some use cases, I don't see why exposing all vendor
defined events is harmful.

A clear statement on the last point would be helpful.

-Arun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/