Re: [PATCH 1/1] perf tools: Add missing user space support for config1/config2

From: Stephane Eranian
Date: Fri Apr 22 2011 - 04:47:49 EST


On Fri, Apr 22, 2011 at 10:06 AM, Ingo Molnar <mingo@xxxxxxx> wrote:
>
> * Ingo Molnar <mingo@xxxxxxx> wrote:
>
>> This needs to be a *lot* more user friendly. Users do not want to type in
>> stupid hexa magic numbers to get profiling. We have moved beyond the oprofile
>> era really.
>>
>> Unless there's proper generalized and human usable support i'm leaning
>> towards turning off the offcore user-space accessible raw bits for now, and
>> use them only kernel-internally, for the cache events.
>
Generic cache events are a myth. They are not usable. I keep getting questions
from users because nobody knows what they are actually counting, thus nobody
knows how to interpret the counts. You cannot really hide the micro-architecture
if you want to make any sensible measurements.

I agree with the poor usability of perf when you have to pass hex
values for events.
But that's why I have a user level library to map event strings to
event codes for perf.
Arun Sharma posted a patch a while ago to connect this library with perf, so far
it's been ignored, it seems:
perf stat -e offcore_response_0:dmd_data_rd foo


> I'm about to push out the patch attached below - it lays out the arguments in
> detail. I don't think we have time to fix this properly for .39 - but memory
> profiling could be a nice feature for v2.6.40.
>
You will not be able to do any reasonable memory profiling using
offcore response
events. Dont' expect a profile to point to the missing loads. If
you're lucky it would
point to the use instruction.


> --------------------->
> From b52c55c6a25e4515b5e075a989ff346fc251ed09 Mon Sep 17 00:00:00 2001
> From: Ingo Molnar <mingo@xxxxxxx>
> Date: Fri, 22 Apr 2011 08:44:38 +0200
> Subject: [PATCH] x86, perf event: Turn off unstructured raw event access to offcore registers
>
> Andi Kleen pointed out that the Intel offcore support patches were merged
> without user-space tool support to the functionality:
>
> Â|
> Â| The offcore_msr perf kernel code was merged into 2.6.39-rc*, but the
> Â| user space bits were not. This made it impossible to set the extra mask
> Â| and actually do the OFFCORE profiling
> Â|
>
> Andi submitted a preliminary patch for user-space support, as an
> extension to perf's raw event syntax:
>
> Â|
> Â| Some raw events -- like the Intel OFFCORE events -- support additional
> Â| parameters. These can be appended after a ':'.
> Â|
> Â| For example on a multi socket Intel Nehalem:
> Â|
> Â| Â Âperf stat -e r1b7:20ff -a sleep 1
> Â|
> Â| Profile the OFFCORE_RESPONSE.ANY_REQUEST with event mask REMOTE_DRAM_0
> Â| that measures any access to DRAM on another socket.
> Â|
>
> But this kind of usability is absolutely unacceptable - users should not
> be expected to type in magic, CPU and model specific incantations to get
> access to useful hardware functionality.
>
> The proper solution is to expose useful offcore functionality via
> generalized events - that way users do not have to care which specific
> CPU model they are using, they can use the conceptual event and not some
> model specific quirky hexa number.
>
> We already have such generalization in place for CPU cache events,
> and it's all very extensible.
>
> "Offcore" events measure general DRAM access patters along various
> parameters. They are particularly useful in NUMA systems.
>
> We want to support them via generalized DRAM events: either as the
> fourth level of cache (after the last-level cache), or as a separate
> generalization category.
>
> That way user-space support would be very obvious, memory access
> profiling could be done via self-explanatory commands like:
>
> Âperf record -e dram ./myapp
> Âperf record -e dram-remote ./myapp
>
> ... to measure DRAM accesses or more expensive cross-node NUMA DRAM
> accesses.
>
> These generalized events would work on all CPUs and architectures that
> have comparable PMU features.
>
> ( Note, these are just examples: actual implementation could have more
> Âsophistication and more parameter - as long as they center around
> Âsimilarly simple usecases. )
>
> Now we do not want to revert *all* of the current offcore bits, as they
> are still somewhat useful for generic last-level-cache events, implemented
> in this commit:
>
> Âe994d7d23a0b: perf: Fix LLC-* events on Intel Nehalem/Westmere
>
> But we definitely do not yet want to expose the unstructured raw events
> to user-space, until better generalization and usability is implemented
> for these hardware event features.
>
> ( Note: after generalization has been implemented raw offcore events can be
> Âsupported as well: there can always be an odd event that is marginally
> Âuseful but not useful enough to generalize. DRAM profiling is definitely
> Â*not* such a category so generalization must be done first. )
>
> Furthermore, PERF_TYPE_RAW access to these registers was not intended
> to go upstream without proper support - it was a side-effect of the above
> e994d7d23a0b commit, not mentioned in the changelog.
>
> As v2.6.39 is nearing release we go for the simplest approach: disable
> the PERF_TYPE_RAW offcore hack for now, before it escapes into a released
> kernel and becomes an ABI.
>
> Once proper structure is implemented for these hardware events and users
> are offered usable solutions we can revisit this issue.
>
> Reported-by: Andi Kleen <ak@xxxxxxxxxxxxxxx>
> Acked-by: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
> Cc: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx>
> Cc: Frederic Weisbecker <fweisbec@xxxxxxxxx>
> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> Link: http://lkml.kernel.org/r/1302658203-4239-1-git-send-email-andi@xxxxxxxxxxxxxx
> Signed-off-by: Ingo Molnar <mingo@xxxxxxx>
> ---
> Âarch/x86/kernel/cpu/perf_event.c | Â Â6 +++++-
> Â1 files changed, 5 insertions(+), 1 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
> index eed3673a..632e5dc 100644
> --- a/arch/x86/kernel/cpu/perf_event.c
> +++ b/arch/x86/kernel/cpu/perf_event.c
> @@ -586,8 +586,12 @@ static int x86_setup_perfctr(struct perf_event *event)
> Â Â Â Â Â Â Â Â Â Â Â Âreturn -EOPNOTSUPP;
> Â Â Â Â}
>
> + Â Â Â /*
> + Â Â Â Â* Do not allow config1 (extended registers) to propagate,
> + Â Â Â Â* there's no sane user-space generalization yet:
> + Â Â Â Â*/
> Â Â Â Âif (attr->type == PERF_TYPE_RAW)
> - Â Â Â Â Â Â Â return x86_pmu_extra_regs(event->attr.config, event);
> + Â Â Â Â Â Â Â return 0;
>
> Â Â Â Âif (attr->type == PERF_TYPE_HW_CACHE)
> Â Â Â Â Â Â Â Âreturn set_ext_hw_attr(hwc, event);
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/