Re: [PATCH 1/7] perf: introduce raw_type attribute to specify the type of a raw sample

From: Stephane Eranian
Date: Thu May 20 2010 - 04:10:35 EST


Robert,

I still don't understand why you need all of this to encode IBS.
I still believe that with attr.config there is plenty of bits to choose
from. I do understand the need for PERF_SAMPLE_RAW. I think
there is no other way.

You simply need to pick an encoding to mark the config as IBS. You
need two bits for this: 00 regular counters, 01 IBS Fetch, 10 IBS op.
Regular counters use 43 bits, IBS fetch uses 58, IBS op uses 52.
So you could use bits 62-63 for instance. You don't need to encode
the sampling period in attr.config for either IBS. You can use
attr.sample_period, so you free up 16 bits.

I understand that IBS may evolve and thus may use more bits. But
you still have at least 16 bits of margin.

Users and tools would rely on an library to provide the event encoding.
No need to come up with some raw hex number on the cmdline.

On Wed, May 19, 2010 at 11:20 PM, Robert Richter <robert.richter@xxxxxxx> wrote:
> This patch introduces a method to specify the type of a raw sample.
> This can be used to setup hardware events other than generic
> performance counters by passing special config data to the pmu. The
> config data can be interpreted different from generic events and thus
> can be used for other purposes.
>
> The raw_type attribute is an extension of the ABI. It reuses the
> unused bp_type space for this. Generic performance counters can be
> setup by setting the raw_type attribute to null. Thus special raw
> events must have a type other than null.
>
> Raw types can be defined as needed for cpu models or architectures.
> To keep backward compatibility all architectures must return an error
> for an event with a raw_type other than null that is not supported.
>
> E.g., raw_type can be used to setup IBS on an AMD cpu. IBS is not
> common to pmu features from other vendors or architectures. The pmu
> must be setup with a special config value. Sample data is returned in
> a certain format back to the userland. An IBS event is created by
> setting a raw event and encoding the IBS type in raw_type. The pmu
> handles this raw event then and passes raw sample data back.
>
> Raw type could be architecure specific, e.g. for x86:
>
> enum perf_raw_type {
> Â Â Â ÂPERF_RAW_PERFCTR Â Â Â Â Â Â Â Â Â Â Â Â= 0,
> Â Â Â ÂPERF_RAW_IBS_FETCH Â Â Â Â Â Â Â Â Â Â Â= 1,
> Â Â Â ÂPERF_RAW_IBS_OP Â Â Â Â Â Â Â Â Â Â Â Â = 2,
>
> Â Â Â ÂPERF_RAW_MAX,
> };
>
> Null is the architecture's default, meaning for x86 a perfctr.
>
> Maybe the raw type definition could also be part of the ABI with one
> definition for all architectures.
>
> To use raw events with perf, the raw event syntax could be suffixed by
> the type (as for breakpoints):
>
> Â -e rNNN[:TYPE]
>
> Example:
>
> Âperf record -e r186A:1 Â Â Â Â Â# ... meaning IBS fetch, cycle count 100000
> Âperf record -e r0:1 -c 100000 Â # ... the same
>
> Or with named types:
>
> Âperf record -e r186A:IBS_FETCH ...
> Âperf record -e r0:IBS_FETCH -c 100000 ...
>
> This solution has a number of advantages: A raw event type may be
> specified without to encode the type in the config value. The attr
> flags are not 'polluted'. We can follow the already existing
> breakpoint concept in syntax and encoding.
>
> Signed-off-by: Robert Richter <robert.richter@xxxxxxx>
> ---
> Âarch/arm/kernel/perf_event.c       |  Â5 +++++
> Âarch/powerpc/kernel/perf_event.c     |  Â2 ++
> Âarch/powerpc/kernel/perf_event_fsl_emb.c | Â Â2 ++
> Âarch/sh/kernel/perf_event.c       Â|  Â2 ++
> Âarch/x86/kernel/cpu/perf_event.c     |  Â5 ++++-
> Âarch/x86/kernel/cpu/perf_event_amd.c   |  Â3 +++
> Âarch/x86/kernel/cpu/perf_event_intel.c  |  Â3 +++
> Âarch/x86/kernel/cpu/perf_event_p4.c   Â|  Â5 +++++
> Âinclude/linux/perf_event.h        |  Â5 ++++-
> Â9 files changed, 30 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c
> index 9e70f20..73d680c 100644
> --- a/arch/arm/kernel/perf_event.c
> +++ b/arch/arm/kernel/perf_event.c
> @@ -388,6 +388,11 @@ __hw_perf_event_init(struct perf_event *event)
> Â Â Â Â} else if (PERF_TYPE_HW_CACHE == event->attr.type) {
> Â Â Â Â Â Â Â Âmapping = armpmu_map_cache_event(event->attr.config);
> Â Â Â Â} else if (PERF_TYPE_RAW == event->attr.type) {
> + Â Â Â Â Â Â Â if (event->attr.raw_type) {
> + Â Â Â Â Â Â Â Â Â Â Â pr_debug("invalid raw type %x\n",
> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Âevent->attr.raw_type);
> + Â Â Â Â Â Â Â Â Â Â Â return -EINVAL;
> + Â Â Â Â Â Â Â }
> Â Â Â Â Â Â Â Âmapping = armpmu->raw_event(event->attr.config);
> Â Â Â Â} else {
> Â Â Â Â Â Â Â Âpr_debug("event type %x not supported\n", event->attr.type);
> diff --git a/arch/powerpc/kernel/perf_event.c b/arch/powerpc/kernel/perf_event.c
> index 43b83c3..c8fb3cf 100644
> --- a/arch/powerpc/kernel/perf_event.c
> +++ b/arch/powerpc/kernel/perf_event.c
> @@ -1036,6 +1036,8 @@ const struct pmu *hw_perf_event_init(struct perf_event *event)
> Â Â Â Â Â Â Â Â Â Â Â Âreturn ERR_PTR(err);
> Â Â Â Â Â Â Â Âbreak;
> Â Â Â Âcase PERF_TYPE_RAW:
> + Â Â Â Â Â Â Â if (event->attr.raw_type)
> + Â Â Â Â Â Â Â Â Â Â Â return ERR_PTR(-EINVAL);
> Â Â Â Â Â Â Â Âev = event->attr.config;
> Â Â Â Â Â Â Â Âbreak;
> Â Â Â Âdefault:
> diff --git a/arch/powerpc/kernel/perf_event_fsl_emb.c b/arch/powerpc/kernel/perf_event_fsl_emb.c
> index 369872f..7547e96 100644
> --- a/arch/powerpc/kernel/perf_event_fsl_emb.c
> +++ b/arch/powerpc/kernel/perf_event_fsl_emb.c
> @@ -452,6 +452,8 @@ const struct pmu *hw_perf_event_init(struct perf_event *event)
> Â Â Â Â Â Â Â Âbreak;
>
> Â Â Â Âcase PERF_TYPE_RAW:
> + Â Â Â Â Â Â Â if (event->attr.raw_type)
> + Â Â Â Â Â Â Â Â Â Â Â return ERR_PTR(-EINVAL);
> Â Â Â Â Â Â Â Âev = event->attr.config;
> Â Â Â Â Â Â Â Âbreak;
>
> diff --git a/arch/sh/kernel/perf_event.c b/arch/sh/kernel/perf_event.c
> index 81b6de4..482cf48 100644
> --- a/arch/sh/kernel/perf_event.c
> +++ b/arch/sh/kernel/perf_event.c
> @@ -142,6 +142,8 @@ static int __hw_perf_event_init(struct perf_event *event)
>
> Â Â Â Âswitch (attr->type) {
> Â Â Â Âcase PERF_TYPE_RAW:
> + Â Â Â Â Â Â Â if (attr->raw_type)
> + Â Â Â Â Â Â Â Â Â Â Â return -EINVAL;
> Â Â Â Â Â Â Â Âconfig = attr->config & sh_pmu->raw_event_mask;
> Â Â Â Â Â Â Â Âbreak;
> Â Â Â Âcase PERF_TYPE_HW_CACHE:
> diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
> index fd4db0d..3539b53 100644
> --- a/arch/x86/kernel/cpu/perf_event.c
> +++ b/arch/x86/kernel/cpu/perf_event.c
> @@ -449,8 +449,11 @@ static int x86_setup_perfctr(struct perf_event *event)
> Â Â Â Â Â Â Â Â Â Â Â Âreturn -EOPNOTSUPP;
> Â Â Â Â}
>
> - Â Â Â if (attr->type == PERF_TYPE_RAW)
> + Â Â Â if (attr->type == PERF_TYPE_RAW) {
> + Â Â Â Â Â Â Â if (attr->raw_type)
> + Â Â Â Â Â Â Â Â Â Â Â return -EINVAL;
> Â Â Â Â Â Â Â Âreturn 0;
> + Â Â Â }
>
> Â Â Â Âif (attr->type == PERF_TYPE_HW_CACHE)
> Â Â Â Â Â Â Â Âreturn set_ext_hw_attr(hwc, attr);
> diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
> index 611df11..87e5ae4 100644
> --- a/arch/x86/kernel/cpu/perf_event_amd.c
> +++ b/arch/x86/kernel/cpu/perf_event_amd.c
> @@ -121,6 +121,9 @@ static int amd_pmu_hw_config(struct perf_event *event)
> Â Â Â Âif (event->attr.type != PERF_TYPE_RAW)
> Â Â Â Â Â Â Â Âreturn 0;
>
> + Â Â Â if (event->attr.raw_type)
> + Â Â Â Â Â Â Â return -EINVAL;
> +
> Â Â Â Âevent->hw.config |= event->attr.config & AMD64_RAW_EVENT_MASK;
>
> Â Â Â Âreturn 0;
> diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
> index fdbc652..d15faf5 100644
> --- a/arch/x86/kernel/cpu/perf_event_intel.c
> +++ b/arch/x86/kernel/cpu/perf_event_intel.c
> @@ -770,6 +770,9 @@ static int intel_pmu_hw_config(struct perf_event *event)
> Â Â Â Âif (event->attr.type != PERF_TYPE_RAW)
> Â Â Â Â Â Â Â Âreturn 0;
>
> + Â Â Â if (event->attr.raw_type)
> + Â Â Â Â Â Â Â return -EINVAL;
> +
> Â Â Â Âif (!(event->attr.config & ARCH_PERFMON_EVENTSEL_ANY))
> Â Â Â Â Â Â Â Âreturn 0;
>
> diff --git a/arch/x86/kernel/cpu/perf_event_p4.c b/arch/x86/kernel/cpu/perf_event_p4.c
> index 87e1803..1001892 100644
> --- a/arch/x86/kernel/cpu/perf_event_p4.c
> +++ b/arch/x86/kernel/cpu/perf_event_p4.c
> @@ -437,6 +437,11 @@ static int p4_hw_config(struct perf_event *event)
> Â Â Â Â Â Â Â Âevent->hw.config = p4_set_ht_bit(event->hw.config);
>
> Â Â Â Âif (event->attr.type == PERF_TYPE_RAW) {
> + Â Â Â Â Â Â Â /* only raw perfctr config supported */
> + Â Â Â Â Â Â Â if (event->attr.raw_type) {
> + Â Â Â Â Â Â Â Â Â Â Â rc = -EINVAL;
> + Â Â Â Â Â Â Â Â Â Â Â goto out;
> + Â Â Â Â Â Â Â }
>
> Â Â Â Â Â Â Â Â/* user data may have out-of-bound event index */
> Â Â Â Â Â Â Â Âevnt = p4_config_unpack_event(event->attr.config);
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index fe50347..f9d2d5e 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -222,7 +222,10 @@ struct perf_event_attr {
>        Â__u32      wakeup_watermark; /* bytes before wakeup  */
> Â Â Â Â};
>
> - Â Â Â __u32 Â Â Â Â Â Â Â Â Â bp_type;
> + Â Â Â union {
> + Â Â Â Â Â Â Â __u32 Â Â Â Â Â bp_type;
> + Â Â Â Â Â Â Â __u32 Â Â Â Â Â raw_type;
> + Â Â Â };
> Â Â Â Â__u64 Â Â Â Â Â Â Â Â Â bp_addr;
> Â Â Â Â__u64 Â Â Â Â Â Â Â Â Â bp_len;
> Â};
> --
> 1.7.1
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/