Re: [tip:perfcounters/core] perf_counter: Implement generalizedcache event types

From: Ingo Molnar
Date: Tue Jun 09 2009 - 08:15:45 EST



* Peter Zijlstra <a.p.zijlstra@xxxxxxxxx> wrote:

> On Sat, 2009-06-06 at 11:16 +0000, tip-bot for Ingo Molnar wrote:
> > Commit-ID: 8326f44da090d6d304d29b9fdc7fb3e20889e329
> > Gitweb: http://git.kernel.org/tip/8326f44da090d6d304d29b9fdc7fb3e20889e329
> > Author: Ingo Molnar <mingo@xxxxxxx>
> > AuthorDate: Fri, 5 Jun 2009 20:22:46 +0200
> > Committer: Ingo Molnar <mingo@xxxxxxx>
> > CommitDate: Sat, 6 Jun 2009 13:14:47 +0200
> >
> > perf_counter: Implement generalized cache event types
> >
> > Extend generic event enumeration with the PERF_TYPE_HW_CACHE
> > method.
> >
> > This is a 3-dimensional space:
> >
> > { L1-D, L1-I, L2, ITLB, DTLB, BPU } x
> > { load, store, prefetch } x
> > { accesses, misses }
> >
> > User-space passes in the 3 coordinates and the kernel provides
> > a counter. (if the hardware supports that type and if the
> > combination makes sense.)
> >
> > Combinations that make no sense produce a -EINVAL.
> > Combinations that are not supported by the hardware produce -ENOTSUP.
> >
> > Extend the tools to deal with this, and rewrite the event symbol
> > parsing code with various popular aliases for the units and
> > access methods above. So 'l1-cache-miss' and 'l1d-read-ops' are
> > both valid aliases.
> >
> > ( x86 is supported for now, with the Nehalem event table filled in,
> > and with Core2 and Atom having placeholder tables. )
> >
>
> > +++ b/include/linux/perf_counter.h
> > @@ -28,6 +28,7 @@ enum perf_event_types {
> > PERF_TYPE_HARDWARE = 0,
> > PERF_TYPE_SOFTWARE = 1,
> > PERF_TYPE_TRACEPOINT = 2,
> > + PERF_TYPE_HW_CACHE = 3,
> >
> > /*
> > * available TYPE space, raw is the max value.
> > @@ -56,6 +57,39 @@ enum attr_ids {
> > };
> >
> > /*
> > + * Generalized hardware cache counters:
> > + *
> > + * { L1-D, L1-I, L2, LLC, ITLB, DTLB, BPU } x
> > + * { read, write, prefetch } x
> > + * { accesses, misses }
> > + */
> > +enum hw_cache_id {
> > + PERF_COUNT_HW_CACHE_L1D,
> > + PERF_COUNT_HW_CACHE_L1I,
> > + PERF_COUNT_HW_CACHE_L2,
> > + PERF_COUNT_HW_CACHE_DTLB,
> > + PERF_COUNT_HW_CACHE_ITLB,
> > + PERF_COUNT_HW_CACHE_BPU,
> > +
> > + PERF_COUNT_HW_CACHE_MAX,
> > +};
> > +
> > +enum hw_cache_op_id {
> > + PERF_COUNT_HW_CACHE_OP_READ,
> > + PERF_COUNT_HW_CACHE_OP_WRITE,
> > + PERF_COUNT_HW_CACHE_OP_PREFETCH,
> > +
> > + PERF_COUNT_HW_CACHE_OP_MAX,
> > +};
> > +
> > +enum hw_cache_op_result_id {
> > + PERF_COUNT_HW_CACHE_RESULT_ACCESS,
> > + PERF_COUNT_HW_CACHE_RESULT_MISS,
> > +
> > + PERF_COUNT_HW_CACHE_RESULT_MAX,
> > +};
>
> May I suggest we do the below instead? Some hardware doesn't make the
> read/write distinction and would therefore have an utterly empty table.
>
> Furthermore, also splitting the hit/miss into a bitfield allows us to
> have hit/miss and the combined value.
>
> ---
> diff --git a/include/linux/perf_counter.h b/include/linux/perf_counter.h
> index 3586df8..1fb72fc 100644
> --- a/include/linux/perf_counter.h
> +++ b/include/linux/perf_counter.h
> @@ -64,29 +64,32 @@ enum attr_ids {
> * { accesses, misses }
> */
> enum hw_cache_id {
> - PERF_COUNT_HW_CACHE_L1D,
> - PERF_COUNT_HW_CACHE_L1I,
> - PERF_COUNT_HW_CACHE_L2,
> - PERF_COUNT_HW_CACHE_DTLB,
> - PERF_COUNT_HW_CACHE_ITLB,
> - PERF_COUNT_HW_CACHE_BPU,
> + PERF_COUNT_HW_CACHE_L1D = 0,
> + PERF_COUNT_HW_CACHE_L1I = 1,
> + PERF_COUNT_HW_CACHE_L2 = 2,
> + PERF_COUNT_HW_CACHE_DTLB = 3,
> + PERF_COUNT_HW_CACHE_ITLB = 4,
> + PERF_COUNT_HW_CACHE_BPU = 5,

Could you please also rename 'L2' to LLC (last level cache)?

We want to know about the fastest and the 'largest' caches.
Intermediate caches are a lot less interesting in practice, and we
dont really want to enumerate a variable number of cache levels.

> PERF_COUNT_HW_CACHE_MAX,
> };
>
> enum hw_cache_op_id {
> - PERF_COUNT_HW_CACHE_OP_READ,
> - PERF_COUNT_HW_CACHE_OP_WRITE,
> - PERF_COUNT_HW_CACHE_OP_PREFETCH,
> + PERF_COUNT_HW_CACHE_OP_READ = 0x1,
> + PERF_COUNT_HW_CACHE_OP_WRITE = 0x2,
> + PERF_COUNT_HW_CACHE_OP_ACCESS = 0x3, /* either READ or WRITE */
> + PERF_COUNT_HW_CACHE_OP_PREFETCH = 0x4, /* XXX should we qualify this with either READ/WRITE? */

Btw., could you please also rename the constants to LOAD/STORE?
That's the proper PMU terminology.

Prefetches are basically almost always reads. That comes from the
physical fact that they can be done speculatively without modifying
memory state. A 'speculative write', while possible in theory, would
have so many side effects, and would complicate the SMP caching
algorithm and an in-order execution model enormously, so i doubt it
will be done in any widespread way anytime soon.

Nevertheless, turning it into a bit does make sense, from an ABI
cleanliness POV.

>
> - PERF_COUNT_HW_CACHE_OP_MAX,
> +
> + PERF_COUNT_HW_CACHE_OP_MAX = 0x8,
> };
>
> enum hw_cache_op_result_id {
> - PERF_COUNT_HW_CACHE_RESULT_ACCESS,
> - PERF_COUNT_HW_CACHE_RESULT_MISS,
> + PERF_COUNT_HW_CACHE_RESULT_HIT = 0x1,
> + PERF_COUNT_HW_CACHE_RESULT_MISS = 0x2,
> + PERF_COUNT_HW_CACHE_RESULT_SUM = 0x3,

RESULT_SUM sounds a bit weird - perhaps RESULT_ANY or RESULT_ALL?

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/