Re: [RFC] perf_events: support for uncore a.k.a. nest units

From: Corey Ashford
Date: Tue Jan 19 2010 - 20:49:52 EST

On 1/19/2010 4:44 PM, Andi Kleen wrote:
On Tue, Jan 19, 2010 at 11:41:01AM -0800, Corey Ashford wrote:
4. How do you encode uncore events?
Uncore events will need to be encoded in the config field of the
perf_event_attr struct using the existing PERF_TYPE_RAW encoding. 64 bits
are available in the config field, and that may be sufficient to support
events on most systems. However, due to the proliferation and added
complexity of PMUs we envision, we might want to add another 64-bit config
(perhaps call it config_extra or config2) field to encode any extra
attributes that might be needed. The exact encoding used, just as for the
current encoding for core events, will be on a per-arch and possibly
per-system basis.

I don't think a raw hex number will scale anywhere. You'll need a human
readable event list / sub event masks with help texts.

Often uncore events have specific restrictions, and that needs
to be enforced somewhere too.

Doing that all in a clean way that is also usable
by programs likely needs a lot more thinking.

I left out one critical detail here: I had in mind that we'd be using a library like libpfm for handling the issue of event names + attributes to raw code translation. In fact, we are using libpfm today for this purpose in the PAPI/perf_events substrate implementation.

bits field
------ -----
3..0 PMU number 0-15 /* specifies which of several identical PMUs being
addressed */
7..4 core id 0-15
8..8 node id 0-1
11..9 chip id 0-7
16..12 blade id 0-31
23..17 rack id 0-128

Such a compressed addressing scheme doesn't seem very future proof.
e.g. core 4 bits for the core is already obsolete (see the "80 core chip" that
was recently announced)

Agreed. If the designer is very generous with the size of each field, it could hold up for quite awhile, but still there's a problem with relating these addresses to actual hardware.

probably put something together for a particular system.

Addressing Option 2)

Have the kernel create nodes for each uncore PMU in /sys/devices/system or
other pseudo file system, such as the existing /proc/device-tree on Power
systems. /sys/devices/system or /proc/device-tree could be explored by the
user tool, and the user could then specify the path of the requested PMU
via a string which the kernel could interpret. To be overly simplistic,
something like "/sys/devices/system/pmus/blade4/cpu0/vectorcopro1". If we
settled on a common tree root to use, we could specify only the relative
path name, "blade4/cpu0/vectorcopro1".

That's a more workable scheme, but you still need to find a clean
way to describe topology (see above). The existing examples in sysfs
are unfortuately all clumpsy imho.

Yes, I agree. Also it's easy to construct a system design that doesn't have a hierarchical topology. A simple example would be a cluster of 32 nodes, each of which is connected to its 31 neighbors. Perhaps for the purposes of just enumerating PMUs, a tree might be sufficient, but it's not clear to me that it is mathematically sufficient for all topologies, not to mention if it's intuitive enough to use. For example, highly-interconnected components might require that PMU leaf nodes be duplicated in multiple branches, i.e. PMU paths might not be unique in some topologies.

I'm certainly open to better alternatives!

Thanks for your thoughts,

- Corey

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at