Re: [PATCH v4] perf tools: Add ARM Statistical Profiling Extensions (SPE) support

From: Adrian Hunter
Date: Thu Jan 11 2018 - 09:18:35 EST


On 22/11/17 01:33, Kim Phillips wrote:
> 'perf record' and 'perf report --dump-raw-trace' supported in this
> release.
>
> Example usage:
>
> $ ./perf record -e arm_spe_0/ts_enable=1,pa_enable=1/ \
> dd if=/dev/zero of=/dev/null count=10000
>
> perf report --dump-raw-trace
>
> Note that the perf.data file is portable, so the report can be run on
> another architecture host if necessary.
>
> Output will contain raw SPE data and its textual representation, such
> as:
>
> 0x550 [0x30]: PERF_RECORD_AUXTRACE size: 0xc408 offset: 0 ref: 0x30005619 idx: 3 tid: 2109 cpu: 3
> .
> . ... ARM SPE data: size 50184 bytes
> . 00000000: 49 00 LD
> . 00000002: b2 00 9c 7b 7a 00 80 ff ff VA 0xffff80007a7b9c00
> . 0000000b: 9a 00 00 LAT 0 XLAT
> . 0000000e: 42 16 EV RETIRED L1D-ACCESS TLB-ACCESS
> . 00000010: b0 b0 c9 15 08 00 00 ff ff PC 0xff00000815c9b0 el3 ns=1
> . 00000019: 98 00 00 LAT 0 TOT
> . 0000001c: 71 00 20 fa fd 16 00 00 00 TS 98750308352
> . 00000025: 49 01 ST
> . 00000027: b2 60 bc 0c 0f 00 00 ff ff VA 0xffff00000f0cbc60
> . 00000030: 9a 00 00 LAT 0 XLAT
> . 00000033: 42 16 EV RETIRED L1D-ACCESS TLB-ACCESS
> . 00000035: b0 48 cc 15 08 00 00 ff ff PC 0xff00000815cc48 el3 ns=1
> . 0000003e: 98 00 00 LAT 0 TOT
> . 00000041: 71 00 20 fa fd 16 00 00 00 TS 98750308352
> . 0000004a: 48 00 INSN-OTHER
> . 0000004c: 42 02 EV RETIRED
> . 0000004e: b0 ac 47 0c 08 00 00 ff ff PC 0xff0000080c47ac el3 ns=1
> . 00000057: 98 00 00 LAT 0 TOT
> . 0000005a: 71 00 20 fa fd 16 00 00 00 TS 98750308352
> . 00000063: 49 00 LD
> . 00000065: b2 18 48 e5 7a 00 80 ff ff VA 0xffff80007ae54818
> . 0000006e: 9a 00 00 LAT 0 XLAT
> . 00000071: 42 16 EV RETIRED L1D-ACCESS TLB-ACCESS
> . 00000073: b0 08 f8 15 08 00 00 ff ff PC 0xff00000815f808 el3 ns=1
> . 0000007c: 98 00 00 LAT 0 TOT
> . 0000007f: 71 00 20 fa fd 16 00 00 00 TS 98750308352
> ...
>
> Other release notes:
>
> - applies to acme's perf/{core,urgent} branches, likely elsewhere
>
> - Report is self-contained within the tool. Record requires enabling
> the kernel SPE driver by setting CONFIG_ARM_SPE_PMU.
>
> - the intel-bts implementation was used as a starting point; its
> min/default/max buffer sizes and power of 2 pages granularity need to be
> revisited for ARM SPE
>
> - recording across multiple SPE clusters/domains not supported
>
> - snapshot support (record -S), and conversion to native perf events
> (e.g., via 'perf inject --itrace'), are also not supported
>
> - technically both cs-etm and spe can be used simultaneously, however
> disabled for simplicity in this release
>
> Signed-off-by: Kim Phillips <kim.phillips@xxxxxxx>

For what is there now, it looks fine from the auxtrace point of view. There
are a couple of minor points below but nevertheless:

Acked-by: Adrian Hunter <adrian.hunter@xxxxxxxxx>

> ---
> v4: rebased onto acme's perf/core, whitespace fixes.
>
> v3: trying to address comments from v2:
>
> - despite adding a find_all_arm_spe_pmus() function to scan for all
> arm_spe_<n> device instances, in order to ensure auxtrace_record__init
> successfully matches the evsel type with the correct arm_spe_pmu type,
> I am still having trouble running in multi-SPE PPI (heterogeneous)
> environments (mmap fails with EOPNOTSUPP, as does running with
> --per-thread on homogeneous systems).
>
> - arm_spe_reference: use gettime instead of direct cntvct register access
>
> - spe-decoder: add a comment for why SPE_EVENTS code sets packet->index.
>
> - added arm_spe_pmu_default_config that accesses the driver
> caps/min_interval and sets the default sampling period to it. This way
> users don't have to specify -c explicitly. Also set is_uncore to false.
>
> - set more sampling bits in the arm_spe and its tracking evsel. Still
> unsure if too liberal, and not sure whether it needs another context
> switch tracking evsel. Comments welcome!
>
> - https://www.spinics.net/lists/arm-kernel/msg614361.html
>
> v2: mostly addressing Mark Rutland's comments as much as possible without his
> feedback to my feedback:
>
> - decoder refactored with a get_payload, not extended to with-ext_len ones like
> get_addr, named the constants
>
> - 0x-ified %x output formats, but decided to not sign extend the addresses in
> the raw dump, rather do so if necessary in the synthesis stage:
> SPE implementations differ in this area, and raw dump should reflect that.
>
> - CPU mask / new record behaviour bisected to commit e3ba76deef23064 "perf
> tools: Force uncore events to system wide monitoring". Waiting to hear back
> on why driver can't do system wide monitoring, even across PPIs, by e.g.,
> sharing the SPE interrupts in one handler (SPE's don't differ in this record
> regard).
>
> - addressed off-list comment from M. Williams:
> "Instruction Type" packet was renamed as "Operation Type".
> so in the spe packet decoder: INSN_TYPE -> OP_TYPE
>
> - do_get_packet fixed to handle excessive, successive PADding from a new source
> of raw SPE data, so instead of:
>
> . 000011ae: 00 PAD
> . 000011af: 00 PAD
> . 000011b0: 00 PAD
> . 000011b1: 00 PAD
> . 000011b2: 00 PAD
> . 000011b3: 00 PAD
> . 000011b4: 00 PAD
> . 000011b5: 00 PAD
> . 000011b6: 00 PAD
>
> we now get:
>
> . 000011ae: 00 00 00 00 00 00 00 00 00 PAD
>
> - fixed 52 00 00 decoded with an empty events clause, adding 'EV' for all events
> clauses now. parser writers can detect for empty event clauses by finding
> nothing after it.
>
> tools/perf/arch/arm/util/auxtrace.c | 75 +++++-
> tools/perf/arch/arm/util/pmu.c | 5 +-
> tools/perf/arch/arm64/util/Build | 3 +-
> tools/perf/arch/arm64/util/arm-spe.c | 235 +++++++++++++++++
> tools/perf/util/Build | 2 +
> tools/perf/util/arm-spe-pkt-decoder.c | 471 ++++++++++++++++++++++++++++++++++
> tools/perf/util/arm-spe-pkt-decoder.h | 52 ++++
> tools/perf/util/arm-spe.c | 318 +++++++++++++++++++++++
> tools/perf/util/arm-spe.h | 42 +++
> tools/perf/util/auxtrace.c | 3 +
> tools/perf/util/auxtrace.h | 1 +
> 11 files changed, 1199 insertions(+), 8 deletions(-)
> create mode 100644 tools/perf/arch/arm64/util/arm-spe.c
> create mode 100644 tools/perf/util/arm-spe-pkt-decoder.c
> create mode 100644 tools/perf/util/arm-spe-pkt-decoder.h
> create mode 100644 tools/perf/util/arm-spe.c
> create mode 100644 tools/perf/util/arm-spe.h
>
> diff --git a/tools/perf/arch/arm/util/auxtrace.c b/tools/perf/arch/arm/util/auxtrace.c
> index 8edf2cb71564..8e7c1ad18224 100644
> --- a/tools/perf/arch/arm/util/auxtrace.c
> +++ b/tools/perf/arch/arm/util/auxtrace.c
> @@ -22,6 +22,42 @@
> #include "../../util/evlist.h"
> #include "../../util/pmu.h"
> #include "cs-etm.h"
> +#include "arm-spe.h"
> +
> +static struct perf_pmu **find_all_arm_spe_pmus(int *nr_spes, int *err)
> +{
> + struct perf_pmu **arm_spe_pmus = NULL;
> + int ret, i, nr_cpus = sysconf(_SC_NPROCESSORS_CONF);
> + /* arm_spe_xxxxxxxxx\0 */
> + char arm_spe_pmu_name[sizeof(ARM_SPE_PMU_NAME) + 10];
> +
> + arm_spe_pmus = zalloc(sizeof(struct perf_pmu *) * nr_cpus);
> + if (!arm_spe_pmus) {
> + pr_err("spes alloc failed\n");
> + *err = -ENOMEM;
> + return NULL;
> + }
> +
> + for (i = 0; i < nr_cpus; i++) {
> + ret = sprintf(arm_spe_pmu_name, "%s%d", ARM_SPE_PMU_NAME, i);
> + if (ret < 0) {
> + pr_err("sprintf failed\n");
> + *err = -ENOMEM;
> + return NULL;
> + }
> +
> + arm_spe_pmus[*nr_spes] = perf_pmu__find(arm_spe_pmu_name);
> + if (arm_spe_pmus[*nr_spes]) {
> + pr_debug2("%s %d: arm_spe_pmu %d type %d name %s\n",
> + __func__, __LINE__, *nr_spes,
> + arm_spe_pmus[*nr_spes]->type,
> + arm_spe_pmus[*nr_spes]->name);
> + (*nr_spes)++;
> + }
> + }
> +
> + return arm_spe_pmus;
> +}
>
> struct auxtrace_record
> *auxtrace_record__init(struct perf_evlist *evlist, int *err)
> @@ -29,22 +65,49 @@ struct auxtrace_record
> struct perf_pmu *cs_etm_pmu;
> struct perf_evsel *evsel;
> bool found_etm = false;
> + bool found_spe = false;
> + static struct perf_pmu **arm_spe_pmus = NULL;
> + static int nr_spes = 0;
> + int i;
> +
> + if (!evlist)
> + return NULL;
>
> cs_etm_pmu = perf_pmu__find(CORESIGHT_ETM_PMU_NAME);
>
> - if (evlist) {
> - evlist__for_each_entry(evlist, evsel) {
> - if (cs_etm_pmu &&
> - evsel->attr.type == cs_etm_pmu->type)
> - found_etm = true;
> + if (!arm_spe_pmus)
> + arm_spe_pmus = find_all_arm_spe_pmus(&nr_spes, err);
> +
> + evlist__for_each_entry(evlist, evsel) {
> + if (cs_etm_pmu &&
> + evsel->attr.type == cs_etm_pmu->type)
> + found_etm = true;
> +
> + if (!nr_spes)
> + continue;
> +
> + for (i = 0; i < nr_spes; i++) {
> + if (evsel->attr.type == arm_spe_pmus[i]->type) {
> + found_spe = true;
> + break;
> + }
> }
> }
>
> + if (found_etm && found_spe) {
> + pr_err("Concurrent ARM Coresight ETM and SPE operation not currently supported\n");
> + *err = -EOPNOTSUPP;
> + return NULL;
> + }
> +
> if (found_etm)
> return cs_etm_record_init(err);
>
> + if (found_spe)
> + return arm_spe_recording_init(err, arm_spe_pmus[i]);
> +
> /*
> - * Clear 'err' even if we haven't found a cs_etm event - that way perf
> + * Clear 'err' even if we haven't found an event - that way perf
> * record can still be used even if tracers aren't present. The NULL
> * return value will take care of telling the infrastructure HW tracing
> * isn't available.
> diff --git a/tools/perf/arch/arm/util/pmu.c b/tools/perf/arch/arm/util/pmu.c
> index 98d67399a0d6..4c06a25ae6b1 100644
> --- a/tools/perf/arch/arm/util/pmu.c
> +++ b/tools/perf/arch/arm/util/pmu.c
> @@ -20,6 +20,7 @@
> #include <linux/perf_event.h>
>
> #include "cs-etm.h"
> +#include "arm-spe.h"
> #include "../../util/pmu.h"
>
> struct perf_event_attr
> @@ -30,7 +31,9 @@ struct perf_event_attr
> /* add ETM default config here */
> pmu->selectable = true;
> pmu->set_drv_config = cs_etm_set_drv_config;
> - }
> + } else
> + if (strstarts(pmu->name, ARM_SPE_PMU_NAME))
> + return arm_spe_pmu_default_config(pmu);

More conventional kernel style would be:

} else if (strstarts(pmu->name, ARM_SPE_PMU_NAME)) {
return arm_spe_pmu_default_config(pmu);
}

Also it looks like arm_spe_pmu_default_config() is only compiled for arm64
so what happens if you build for arm.

> #endif
> return NULL;
> }
> diff --git a/tools/perf/arch/arm64/util/Build b/tools/perf/arch/arm64/util/Build
> index cef6fb38d17e..f9969bb88ccb 100644
> --- a/tools/perf/arch/arm64/util/Build
> +++ b/tools/perf/arch/arm64/util/Build
> @@ -3,4 +3,5 @@ libperf-$(CONFIG_LOCAL_LIBUNWIND) += unwind-libunwind.o
>
> libperf-$(CONFIG_AUXTRACE) += ../../arm/util/pmu.o \
> ../../arm/util/auxtrace.o \
> - ../../arm/util/cs-etm.o
> + ../../arm/util/cs-etm.o \
> + arm-spe.o
> diff --git a/tools/perf/arch/arm64/util/arm-spe.c b/tools/perf/arch/arm64/util/arm-spe.c
> new file mode 100644
> index 000000000000..ef576b52c850
> --- /dev/null
> +++ b/tools/perf/arch/arm64/util/arm-spe.c
> @@ -0,0 +1,235 @@
> +/*
> + * ARM Statistical Profiling Extensions (SPE) support
> + * Copyright (c) 2017, ARM Ltd.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
> + * more details.

Might as well switch to SPDX license identifiers, here and elsewhere.

> + *
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/types.h>
> +#include <linux/bitops.h>
> +#include <linux/log2.h>
> +#include <time.h>
> +
> +#include "../../util/cpumap.h"
> +#include "../../util/evsel.h"
> +#include "../../util/evlist.h"
> +#include "../../util/session.h"
> +#include "../../util/util.h"
> +#include "../../util/pmu.h"
> +#include "../../util/debug.h"
> +#include "../../util/tsc.h"

tsc.h is not needed

> +#include "../../util/auxtrace.h"
> +#include "../../util/arm-spe.h"
> +
> +#define KiB(x) ((x) * 1024)
> +#define MiB(x) ((x) * 1024 * 1024)
> +
> +struct arm_spe_recording {
> + struct auxtrace_record itr;
> + struct perf_pmu *arm_spe_pmu;
> + struct perf_evlist *evlist;
> +};
> +
> +static size_t
> +arm_spe_info_priv_size(struct auxtrace_record *itr __maybe_unused,
> + struct perf_evlist *evlist __maybe_unused)
> +{
> + return ARM_SPE_AUXTRACE_PRIV_SIZE;
> +}
> +
> +static int arm_spe_info_fill(struct auxtrace_record *itr,
> + struct perf_session *session,
> + struct auxtrace_info_event *auxtrace_info,
> + size_t priv_size)
> +{
> + struct arm_spe_recording *sper =
> + container_of(itr, struct arm_spe_recording, itr);
> + struct perf_pmu *arm_spe_pmu = sper->arm_spe_pmu;
> +
> + if (priv_size != ARM_SPE_AUXTRACE_PRIV_SIZE)
> + return -EINVAL;
> +
> + if (!session->evlist->nr_mmaps)
> + return -EINVAL;
> +
> + auxtrace_info->type = PERF_AUXTRACE_ARM_SPE;
> + auxtrace_info->priv[ARM_SPE_PMU_TYPE] = arm_spe_pmu->type;
> +
> + return 0;
> +}
> +
> +static int arm_spe_recording_options(struct auxtrace_record *itr,
> + struct perf_evlist *evlist,
> + struct record_opts *opts)
> +{
> + struct arm_spe_recording *sper =
> + container_of(itr, struct arm_spe_recording, itr);
> + struct perf_pmu *arm_spe_pmu = sper->arm_spe_pmu;
> + struct perf_evsel *evsel, *arm_spe_evsel = NULL;
> + bool privileged = geteuid() == 0 || perf_event_paranoid() < 0;
> + struct perf_evsel *tracking_evsel;
> + int err;
> +
> + sper->evlist = evlist;
> +
> + evlist__for_each_entry(evlist, evsel) {
> + if (evsel->attr.type == arm_spe_pmu->type) {
> + if (arm_spe_evsel) {
> + pr_err("There may be only one " ARM_SPE_PMU_NAME "x event\n");
> + return -EINVAL;
> + }
> + evsel->attr.freq = 0;
> + evsel->attr.sample_period = 1;
> + arm_spe_evsel = evsel;
> + opts->full_auxtrace = true;
> + }
> + }
> +
> + if (!opts->full_auxtrace)
> + return 0;
> +
> + /* We are in full trace mode but '-m,xyz' wasn't specified */
> + if (opts->full_auxtrace && !opts->auxtrace_mmap_pages) {
> + if (privileged) {
> + opts->auxtrace_mmap_pages = MiB(4) / page_size;
> + } else {
> + opts->auxtrace_mmap_pages = KiB(128) / page_size;
> + if (opts->mmap_pages == UINT_MAX)
> + opts->mmap_pages = KiB(256) / page_size;
> + }
> + }
> +
> + /* Validate auxtrace_mmap_pages */
> + if (opts->auxtrace_mmap_pages) {
> + size_t sz = opts->auxtrace_mmap_pages * (size_t)page_size;
> + size_t min_sz = KiB(8);
> +
> + if (sz < min_sz || !is_power_of_2(sz)) {
> + pr_err("Invalid mmap size for ARM SPE: must be at least %zuKiB and a power of 2\n",
> + min_sz / 1024);
> + return -EINVAL;
> + }
> + }
> +
> +
> + /*
> + * To obtain the auxtrace buffer file descriptor, the auxtrace event
> + * must come first.
> + */
> + perf_evlist__to_front(evlist, arm_spe_evsel);
> +
> + perf_evsel__set_sample_bit(arm_spe_evsel, CPU);
> + perf_evsel__set_sample_bit(arm_spe_evsel, TIME);
> + perf_evsel__set_sample_bit(arm_spe_evsel, TID);
> +
> + /* Add dummy event to keep tracking */
> + err = parse_events(evlist, "dummy:u", NULL);
> + if (err)
> + return err;
> +
> + tracking_evsel = perf_evlist__last(evlist);
> + perf_evlist__set_tracking_event(evlist, tracking_evsel);
> +
> + tracking_evsel->attr.freq = 0;
> + tracking_evsel->attr.sample_period = 1;
> + perf_evsel__set_sample_bit(tracking_evsel, TIME);
> + perf_evsel__set_sample_bit(tracking_evsel, CPU);
> + perf_evsel__reset_sample_bit(tracking_evsel, BRANCH_STACK);
> +
> + return 0;
> +}
> +
> +static u64 arm_spe_reference(struct auxtrace_record *itr __maybe_unused)
> +{
> + struct timespec ts;
> +
> + clock_gettime(CLOCK_MONOTONIC_RAW, &ts);
> +
> + return ts.tv_sec ^ ts.tv_nsec;
> +}
> +
> +static void arm_spe_recording_free(struct auxtrace_record *itr)
> +{
> + struct arm_spe_recording *sper =
> + container_of(itr, struct arm_spe_recording, itr);
> +
> + free(sper);
> +}
> +
> +static int arm_spe_read_finish(struct auxtrace_record *itr, int idx)
> +{
> + struct arm_spe_recording *sper =
> + container_of(itr, struct arm_spe_recording, itr);
> + struct perf_evsel *evsel;
> +
> + evlist__for_each_entry(sper->evlist, evsel) {
> + if (evsel->attr.type == sper->arm_spe_pmu->type)
> + return perf_evlist__enable_event_idx(sper->evlist,
> + evsel, idx);
> + }
> + return -EINVAL;
> +}
> +
> +struct auxtrace_record *arm_spe_recording_init(int *err,
> + struct perf_pmu *arm_spe_pmu)
> +{
> + struct arm_spe_recording *sper;
> +
> + if (!arm_spe_pmu) {
> + *err = -ENODEV;
> + return NULL;
> + }
> +
> + sper = zalloc(sizeof(struct arm_spe_recording));
> + if (!sper) {
> + *err = -ENOMEM;
> + return NULL;
> + }
> +
> + sper->arm_spe_pmu = arm_spe_pmu;
> + sper->itr.recording_options = arm_spe_recording_options;
> + sper->itr.info_priv_size = arm_spe_info_priv_size;
> + sper->itr.info_fill = arm_spe_info_fill;
> + sper->itr.free = arm_spe_recording_free;
> + sper->itr.reference = arm_spe_reference;
> + sper->itr.read_finish = arm_spe_read_finish;
> + sper->itr.alignment = 0;
> +
> + return &sper->itr;
> +}
> +
> +struct perf_event_attr
> +*arm_spe_pmu_default_config(struct perf_pmu *arm_spe_pmu)
> +{
> + struct perf_event_attr *attr;
> +
> + attr = zalloc(sizeof(struct perf_event_attr));
> + if (!attr) {
> + pr_err("arm_spe default config cannot allocate a perf_event_attr\n");
> + return NULL;
> + }
> +
> + /*
> + * If kernel driver doesn't advertise a minimum,
> + * use max allowable by PMSIDR_EL1.INTERVAL
> + */
> + if (perf_pmu__scan_file(arm_spe_pmu, "caps/min_interval", "%llu",
> + &attr->sample_period) != 1) {
> + pr_debug("arm_spe driver doesn't advertise a min. interval. Using 4096\n");
> + attr->sample_period = 4096;
> + }
> +
> + arm_spe_pmu->selectable = true;
> + arm_spe_pmu->is_uncore = false;
> +
> + return attr;
> +}
> diff --git a/tools/perf/util/Build b/tools/perf/util/Build
> index a3de7916fe63..7c6a8b461e24 100644
> --- a/tools/perf/util/Build
> +++ b/tools/perf/util/Build
> @@ -86,6 +86,8 @@ libperf-$(CONFIG_AUXTRACE) += auxtrace.o
> libperf-$(CONFIG_AUXTRACE) += intel-pt-decoder/
> libperf-$(CONFIG_AUXTRACE) += intel-pt.o
> libperf-$(CONFIG_AUXTRACE) += intel-bts.o
> +libperf-$(CONFIG_AUXTRACE) += arm-spe.o
> +libperf-$(CONFIG_AUXTRACE) += arm-spe-pkt-decoder.o
> libperf-y += parse-branch-options.o
> libperf-y += dump-insn.o
> libperf-y += parse-regs-options.o
> diff --git a/tools/perf/util/arm-spe-pkt-decoder.c b/tools/perf/util/arm-spe-pkt-decoder.c
> new file mode 100644
> index 000000000000..234943471d30
> --- /dev/null
> +++ b/tools/perf/util/arm-spe-pkt-decoder.c
> @@ -0,0 +1,471 @@
> +/*
> + * ARM Statistical Profiling Extensions (SPE) support
> + * Copyright (c) 2017, ARM Ltd.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
> + * more details.
> + *
> + */
> +
> +#include <stdio.h>
> +#include <string.h>
> +#include <endian.h>
> +#include <byteswap.h>
> +
> +#include "arm-spe-pkt-decoder.h"
> +
> +#define BIT(n) (1ULL << (n))
> +
> +#define NS_FLAG BIT(63)
> +#define EL_FLAG (BIT(62) | BIT(61))
> +
> +#define SPE_HEADER0_PAD 0x0
> +#define SPE_HEADER0_END 0x1
> +#define SPE_HEADER0_ADDRESS 0x30 /* address packet (short) */
> +#define SPE_HEADER0_ADDRESS_MASK 0x38
> +#define SPE_HEADER0_COUNTER 0x18 /* counter packet (short) */
> +#define SPE_HEADER0_COUNTER_MASK 0x38
> +#define SPE_HEADER0_TIMESTAMP 0x71
> +#define SPE_HEADER0_TIMESTAMP 0x71
> +#define SPE_HEADER0_EVENTS 0x2
> +#define SPE_HEADER0_EVENTS_MASK 0xf
> +#define SPE_HEADER0_SOURCE 0x3
> +#define SPE_HEADER0_SOURCE_MASK 0xf
> +#define SPE_HEADER0_CONTEXT 0x24
> +#define SPE_HEADER0_CONTEXT_MASK 0x3c
> +#define SPE_HEADER0_OP_TYPE 0x8
> +#define SPE_HEADER0_OP_TYPE_MASK 0x3c
> +#define SPE_HEADER1_ALIGNMENT 0x0
> +#define SPE_HEADER1_ADDRESS 0xb0 /* address packet (extended) */
> +#define SPE_HEADER1_ADDRESS_MASK 0xf8
> +#define SPE_HEADER1_COUNTER 0x98 /* counter packet (extended) */
> +#define SPE_HEADER1_COUNTER_MASK 0xf8
> +
> +#if __BYTE_ORDER == __BIG_ENDIAN
> +#define le16_to_cpu bswap_16
> +#define le32_to_cpu bswap_32
> +#define le64_to_cpu bswap_64
> +#define memcpy_le64(d, s, n) do { \
> + memcpy((d), (s), (n)); \
> + *(d) = le64_to_cpu(*(d)); \
> +} while (0)
> +#else
> +#define le16_to_cpu
> +#define le32_to_cpu
> +#define le64_to_cpu
> +#define memcpy_le64 memcpy
> +#endif
> +
> +static const char * const arm_spe_packet_name[] = {
> + [ARM_SPE_PAD] = "PAD",
> + [ARM_SPE_END] = "END",
> + [ARM_SPE_TIMESTAMP] = "TS",
> + [ARM_SPE_ADDRESS] = "ADDR",
> + [ARM_SPE_COUNTER] = "LAT",
> + [ARM_SPE_CONTEXT] = "CONTEXT",
> + [ARM_SPE_OP_TYPE] = "OP-TYPE",
> + [ARM_SPE_EVENTS] = "EVENTS",
> + [ARM_SPE_DATA_SOURCE] = "DATA-SOURCE",
> +};
> +
> +const char *arm_spe_pkt_name(enum arm_spe_pkt_type type)
> +{
> + return arm_spe_packet_name[type];
> +}
> +
> +/* return ARM SPE payload size from its encoding,
> + * which is in bits 5:4 of the byte.
> + * 00 : byte
> + * 01 : halfword (2)
> + * 10 : word (4)
> + * 11 : doubleword (8)
> + */
> +static int payloadlen(unsigned char byte)
> +{
> + return 1 << ((byte & 0x30) >> 4);
> +}
> +
> +static int arm_spe_get_payload(const unsigned char *buf, size_t len,
> + struct arm_spe_pkt *packet)
> +{
> + size_t payload_len = payloadlen(buf[0]);
> +
> + if (len < 1 + payload_len)
> + return ARM_SPE_NEED_MORE_BYTES;
> +
> + buf++;
> +
> + switch (payload_len) {
> + case 1: packet->payload = *(uint8_t *)buf; break;
> + case 2: packet->payload = le16_to_cpu(*(uint16_t *)buf); break;
> + case 4: packet->payload = le32_to_cpu(*(uint32_t *)buf); break;
> + case 8: packet->payload = le64_to_cpu(*(uint64_t *)buf); break;
> + default: return ARM_SPE_BAD_PACKET;
> + }
> +
> + return 1 + payload_len;
> +}
> +
> +static int arm_spe_get_pad(struct arm_spe_pkt *packet)
> +{
> + packet->type = ARM_SPE_PAD;
> + return 1;
> +}
> +
> +static int arm_spe_get_alignment(const unsigned char *buf, size_t len,
> + struct arm_spe_pkt *packet)
> +{
> + unsigned int alignment = 1 << ((buf[0] & 0xf) + 1);
> +
> + if (len < alignment)
> + return ARM_SPE_NEED_MORE_BYTES;
> +
> + packet->type = ARM_SPE_PAD;
> + return alignment - (((uint64_t)buf) & (alignment - 1));
> +}
> +
> +static int arm_spe_get_end(struct arm_spe_pkt *packet)
> +{
> + packet->type = ARM_SPE_END;
> + return 1;
> +}
> +
> +static int arm_spe_get_timestamp(const unsigned char *buf, size_t len,
> + struct arm_spe_pkt *packet)
> +{
> + packet->type = ARM_SPE_TIMESTAMP;
> + return arm_spe_get_payload(buf, len, packet);
> +}
> +
> +static int arm_spe_get_events(const unsigned char *buf, size_t len,
> + struct arm_spe_pkt *packet)
> +{
> + int ret = arm_spe_get_payload(buf, len, packet);
> +
> + packet->type = ARM_SPE_EVENTS;
> +
> + /* we use index to identify Events with a less number of
> + * comparisons in arm_spe_pkt_desc(): E.g., the LLC-ACCESS,
> + * LLC-REFILL, and REMOTE-ACCESS events are identified iff
> + * index > 1.
> + */
> + packet->index = ret - 1;
> +
> + return ret;
> +}
> +
> +static int arm_spe_get_data_source(const unsigned char *buf, size_t len,
> + struct arm_spe_pkt *packet)
> +{
> + packet->type = ARM_SPE_DATA_SOURCE;
> + return arm_spe_get_payload(buf, len, packet);
> +}
> +
> +static int arm_spe_get_context(const unsigned char *buf, size_t len,
> + struct arm_spe_pkt *packet)
> +{
> + packet->type = ARM_SPE_CONTEXT;
> + packet->index = buf[0] & 0x3;
> +
> + return arm_spe_get_payload(buf, len, packet);
> +}
> +
> +static int arm_spe_get_op_type(const unsigned char *buf, size_t len,
> + struct arm_spe_pkt *packet)
> +{
> + packet->type = ARM_SPE_OP_TYPE;
> + packet->index = buf[0] & 0x3;
> + return arm_spe_get_payload(buf, len, packet);
> +}
> +
> +static int arm_spe_get_counter(const unsigned char *buf, size_t len,
> + const unsigned char ext_hdr, struct arm_spe_pkt *packet)
> +{
> + if (len < 2)
> + return ARM_SPE_NEED_MORE_BYTES;
> +
> + packet->type = ARM_SPE_COUNTER;
> + if (ext_hdr)
> + packet->index = ((buf[0] & 0x3) << 3) | (buf[1] & 0x7);
> + else
> + packet->index = buf[0] & 0x7;
> +
> + packet->payload = le16_to_cpu(*(uint16_t *)(buf + 1));
> +
> + return 1 + ext_hdr + 2;
> +}
> +
> +static int arm_spe_get_addr(const unsigned char *buf, size_t len,
> + const unsigned char ext_hdr, struct arm_spe_pkt *packet)
> +{
> + if (len < 8)
> + return ARM_SPE_NEED_MORE_BYTES;
> +
> + packet->type = ARM_SPE_ADDRESS;
> + if (ext_hdr)
> + packet->index = ((buf[0] & 0x3) << 3) | (buf[1] & 0x7);
> + else
> + packet->index = buf[0] & 0x7;
> +
> + memcpy_le64(&packet->payload, buf + 1, 8);
> +
> + return 1 + ext_hdr + 8;
> +}
> +
> +static int arm_spe_do_get_packet(const unsigned char *buf, size_t len,
> + struct arm_spe_pkt *packet)
> +{
> + unsigned int byte;
> +
> + memset(packet, 0, sizeof(struct arm_spe_pkt));
> +
> + if (!len)
> + return ARM_SPE_NEED_MORE_BYTES;
> +
> + byte = buf[0];
> + if (byte == SPE_HEADER0_PAD)
> + return arm_spe_get_pad(packet);
> + else if (byte == SPE_HEADER0_END) /* no timestamp at end of record */
> + return arm_spe_get_end(packet);
> + else if (byte & 0xc0 /* 0y11xxxxxx */) {
> + if (byte & 0x80) {
> + if ((byte & SPE_HEADER0_ADDRESS_MASK) == SPE_HEADER0_ADDRESS)
> + return arm_spe_get_addr(buf, len, 0, packet);
> + if ((byte & SPE_HEADER0_COUNTER_MASK) == SPE_HEADER0_COUNTER)
> + return arm_spe_get_counter(buf, len, 0, packet);
> + } else
> + if (byte == SPE_HEADER0_TIMESTAMP)
> + return arm_spe_get_timestamp(buf, len, packet);
> + else if ((byte & SPE_HEADER0_EVENTS_MASK) == SPE_HEADER0_EVENTS)
> + return arm_spe_get_events(buf, len, packet);
> + else if ((byte & SPE_HEADER0_SOURCE_MASK) == SPE_HEADER0_SOURCE)
> + return arm_spe_get_data_source(buf, len, packet);
> + else if ((byte & SPE_HEADER0_CONTEXT_MASK) == SPE_HEADER0_CONTEXT)
> + return arm_spe_get_context(buf, len, packet);
> + else if ((byte & SPE_HEADER0_OP_TYPE_MASK) == SPE_HEADER0_OP_TYPE)
> + return arm_spe_get_op_type(buf, len, packet);
> + } else if ((byte & 0xe0) == 0x20 /* 0y001xxxxx */) {
> + /* 16-bit header */
> + byte = buf[1];
> + if (byte == SPE_HEADER1_ALIGNMENT)
> + return arm_spe_get_alignment(buf, len, packet);
> + else if ((byte & SPE_HEADER1_ADDRESS_MASK) == SPE_HEADER1_ADDRESS)
> + return arm_spe_get_addr(buf, len, 1, packet);
> + else if ((byte & SPE_HEADER1_COUNTER_MASK) == SPE_HEADER1_COUNTER)
> + return arm_spe_get_counter(buf, len, 1, packet);
> + }
> +
> + return ARM_SPE_BAD_PACKET;
> +}
> +
> +int arm_spe_get_packet(const unsigned char *buf, size_t len,
> + struct arm_spe_pkt *packet)
> +{
> + int ret;
> +
> + ret = arm_spe_do_get_packet(buf, len, packet);
> + /* put multiple consecutive PADs on the same line, up to
> + * the fixed-width output format of 16 bytes per line.
> + */
> + if (ret > 0 && packet->type == ARM_SPE_PAD) {
> + while (ret < 16 && len > (size_t)ret && !buf[ret])
> + ret += 1;
> + }
> + return ret;
> +}
> +
> +int arm_spe_pkt_desc(const struct arm_spe_pkt *packet, char *buf,
> + size_t buf_len)
> +{
> + int ret, ns, el, index = packet->index;
> + unsigned long long payload = packet->payload;
> + const char *name = arm_spe_pkt_name(packet->type);
> +
> + switch (packet->type) {
> + case ARM_SPE_BAD:
> + case ARM_SPE_PAD:
> + case ARM_SPE_END:
> + return snprintf(buf, buf_len, "%s", name);
> + case ARM_SPE_EVENTS: {
> + size_t blen = buf_len;
> +
> + ret = 0;
> + ret = snprintf(buf, buf_len, "EV");
> + buf += ret;
> + blen -= ret;
> + if (payload & 0x1) {
> + ret = snprintf(buf, buf_len, " EXCEPTION-GEN");
> + buf += ret;
> + blen -= ret;
> + }
> + if (payload & 0x2) {
> + ret = snprintf(buf, buf_len, " RETIRED");
> + buf += ret;
> + blen -= ret;
> + }
> + if (payload & 0x4) {
> + ret = snprintf(buf, buf_len, " L1D-ACCESS");
> + buf += ret;
> + blen -= ret;
> + }
> + if (payload & 0x8) {
> + ret = snprintf(buf, buf_len, " L1D-REFILL");
> + buf += ret;
> + blen -= ret;
> + }
> + if (payload & 0x10) {
> + ret = snprintf(buf, buf_len, " TLB-ACCESS");
> + buf += ret;
> + blen -= ret;
> + }
> + if (payload & 0x20) {
> + ret = snprintf(buf, buf_len, " TLB-REFILL");
> + buf += ret;
> + blen -= ret;
> + }
> + if (payload & 0x40) {
> + ret = snprintf(buf, buf_len, " NOT-TAKEN");
> + buf += ret;
> + blen -= ret;
> + }
> + if (payload & 0x80) {
> + ret = snprintf(buf, buf_len, " MISPRED");
> + buf += ret;
> + blen -= ret;
> + }
> + if (index > 1) {
> + if (payload & 0x100) {
> + ret = snprintf(buf, buf_len, " LLC-ACCESS");
> + buf += ret;
> + blen -= ret;
> + }
> + if (payload & 0x200) {
> + ret = snprintf(buf, buf_len, " LLC-REFILL");
> + buf += ret;
> + blen -= ret;
> + }
> + if (payload & 0x400) {
> + ret = snprintf(buf, buf_len, " REMOTE-ACCESS");
> + buf += ret;
> + blen -= ret;
> + }
> + }
> + if (ret < 0)
> + return ret;
> + blen -= ret;
> + return buf_len - blen;
> + }
> + case ARM_SPE_OP_TYPE:
> + switch (index) {
> + case 0: return snprintf(buf, buf_len, "%s", payload & 0x1 ?
> + "COND-SELECT" : "INSN-OTHER");
> + case 1: {
> + size_t blen = buf_len;
> +
> + if (payload & 0x1)
> + ret = snprintf(buf, buf_len, "ST");
> + else
> + ret = snprintf(buf, buf_len, "LD");
> + buf += ret;
> + blen -= ret;
> + if (payload & 0x2) {
> + if (payload & 0x4) {
> + ret = snprintf(buf, buf_len, " AT");
> + buf += ret;
> + blen -= ret;
> + }
> + if (payload & 0x8) {
> + ret = snprintf(buf, buf_len, " EXCL");
> + buf += ret;
> + blen -= ret;
> + }
> + if (payload & 0x10) {
> + ret = snprintf(buf, buf_len, " AR");
> + buf += ret;
> + blen -= ret;
> + }
> + } else if (payload & 0x4) {
> + ret = snprintf(buf, buf_len, " SIMD-FP");
> + buf += ret;
> + blen -= ret;
> + }
> + if (ret < 0)
> + return ret;
> + blen -= ret;
> + return buf_len - blen;
> + }
> + case 2: {
> + size_t blen = buf_len;
> +
> + ret = snprintf(buf, buf_len, "B");
> + buf += ret;
> + blen -= ret;
> + if (payload & 0x1) {
> + ret = snprintf(buf, buf_len, " COND");
> + buf += ret;
> + blen -= ret;
> + }
> + if (payload & 0x2) {
> + ret = snprintf(buf, buf_len, " IND");
> + buf += ret;
> + blen -= ret;
> + }
> + if (ret < 0)
> + return ret;
> + blen -= ret;
> + return buf_len - blen;
> + }
> + default: return 0;
> + }
> + case ARM_SPE_DATA_SOURCE:
> + case ARM_SPE_TIMESTAMP:
> + return snprintf(buf, buf_len, "%s %lld", name, payload);
> + case ARM_SPE_ADDRESS:
> + switch (index) {
> + case 0:
> + case 1: ns = !!(packet->payload & NS_FLAG);
> + el = (packet->payload & EL_FLAG) >> 61;
> + payload &= ~(0xffULL << 56);
> + return snprintf(buf, buf_len, "%s 0x%llx el%d ns=%d",
> + (index == 1) ? "TGT" : "PC", payload, el, ns);
> + case 2: return snprintf(buf, buf_len, "VA 0x%llx", payload);
> + case 3: ns = !!(packet->payload & NS_FLAG);
> + payload &= ~(0xffULL << 56);
> + return snprintf(buf, buf_len, "PA 0x%llx ns=%d",
> + payload, ns);
> + default: return 0;
> + }
> + case ARM_SPE_CONTEXT:
> + return snprintf(buf, buf_len, "%s 0x%lx el%d", name,
> + (unsigned long)payload, index + 1);
> + case ARM_SPE_COUNTER: {
> + size_t blen = buf_len;
> +
> + ret = snprintf(buf, buf_len, "%s %d ", name,
> + (unsigned short)payload);
> + buf += ret;
> + blen -= ret;
> + switch (index) {
> + case 0: ret = snprintf(buf, buf_len, "TOT"); break;
> + case 1: ret = snprintf(buf, buf_len, "ISSUE"); break;
> + case 2: ret = snprintf(buf, buf_len, "XLAT"); break;
> + default: ret = 0;
> + }
> + if (ret < 0)
> + return ret;
> + blen -= ret;
> + return buf_len - blen;
> + }
> + default:
> + break;
> + }
> +
> + return snprintf(buf, buf_len, "%s 0x%llx (%d)",
> + name, payload, packet->index);
> +}
> diff --git a/tools/perf/util/arm-spe-pkt-decoder.h b/tools/perf/util/arm-spe-pkt-decoder.h
> new file mode 100644
> index 000000000000..f146f4143447
> --- /dev/null
> +++ b/tools/perf/util/arm-spe-pkt-decoder.h
> @@ -0,0 +1,52 @@
> +/*
> + * ARM Statistical Profiling Extensions (SPE) support
> + * Copyright (c) 2017, ARM Ltd.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
> + * more details.
> + *
> + */
> +
> +#ifndef INCLUDE__ARM_SPE_PKT_DECODER_H__
> +#define INCLUDE__ARM_SPE_PKT_DECODER_H__
> +
> +#include <stddef.h>
> +#include <stdint.h>
> +
> +#define ARM_SPE_PKT_DESC_MAX 256
> +
> +#define ARM_SPE_NEED_MORE_BYTES -1
> +#define ARM_SPE_BAD_PACKET -2
> +
> +enum arm_spe_pkt_type {
> + ARM_SPE_BAD,
> + ARM_SPE_PAD,
> + ARM_SPE_END,
> + ARM_SPE_TIMESTAMP,
> + ARM_SPE_ADDRESS,
> + ARM_SPE_COUNTER,
> + ARM_SPE_CONTEXT,
> + ARM_SPE_OP_TYPE,
> + ARM_SPE_EVENTS,
> + ARM_SPE_DATA_SOURCE,
> +};
> +
> +struct arm_spe_pkt {
> + enum arm_spe_pkt_type type;
> + unsigned char index;
> + uint64_t payload;
> +};
> +
> +const char *arm_spe_pkt_name(enum arm_spe_pkt_type);
> +
> +int arm_spe_get_packet(const unsigned char *buf, size_t len,
> + struct arm_spe_pkt *packet);
> +
> +int arm_spe_pkt_desc(const struct arm_spe_pkt *packet, char *buf, size_t len);
> +#endif
> diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
> new file mode 100644
> index 000000000000..67965e26b5b1
> --- /dev/null
> +++ b/tools/perf/util/arm-spe.c
> @@ -0,0 +1,318 @@
> +/*
> + * ARM Statistical Profiling Extensions (SPE) support
> + * Copyright (c) 2017, ARM Ltd.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
> + * more details.
> + *
> + */
> +
> +#include <endian.h>
> +#include <errno.h>
> +#include <byteswap.h>
> +#include <inttypes.h>
> +#include <linux/kernel.h>
> +#include <linux/types.h>
> +#include <linux/bitops.h>
> +#include <linux/log2.h>
> +
> +#include "cpumap.h"
> +#include "color.h"
> +#include "evsel.h"
> +#include "evlist.h"
> +#include "machine.h"
> +#include "session.h"
> +#include "util.h"
> +#include "thread.h"
> +#include "debug.h"
> +#include "auxtrace.h"
> +#include "arm-spe.h"
> +#include "arm-spe-pkt-decoder.h"
> +
> +struct arm_spe {
> + struct auxtrace auxtrace;
> + struct auxtrace_queues queues;
> + struct auxtrace_heap heap;
> + u32 auxtrace_type;
> + struct perf_session *session;
> + struct machine *machine;
> + u32 pmu_type;
> +};
> +
> +struct arm_spe_queue {
> + struct arm_spe *spe;
> + unsigned int queue_nr;
> + struct auxtrace_buffer *buffer;
> + bool on_heap;
> + bool done;
> + pid_t pid;
> + pid_t tid;
> + int cpu;
> +};
> +
> +static void arm_spe_dump(struct arm_spe *spe __maybe_unused,
> + unsigned char *buf, size_t len)
> +{
> + struct arm_spe_pkt packet;
> + size_t pos = 0;
> + int ret, pkt_len, i;
> + char desc[ARM_SPE_PKT_DESC_MAX];
> + const char *color = PERF_COLOR_BLUE;
> +
> + color_fprintf(stdout, color,
> + ". ... ARM SPE data: size %zu bytes\n",
> + len);
> +
> + while (len) {
> + ret = arm_spe_get_packet(buf, len, &packet);
> + if (ret > 0)
> + pkt_len = ret;
> + else
> + pkt_len = 1;
> + printf(".");
> + color_fprintf(stdout, color, " %08x: ", pos);
> + for (i = 0; i < pkt_len; i++)
> + color_fprintf(stdout, color, " %02x", buf[i]);
> + for (; i < 16; i++)
> + color_fprintf(stdout, color, " ");
> + if (ret > 0) {
> + ret = arm_spe_pkt_desc(&packet, desc,
> + ARM_SPE_PKT_DESC_MAX);
> + if (ret > 0)
> + color_fprintf(stdout, color, " %s\n", desc);
> + } else {
> + color_fprintf(stdout, color, " Bad packet!\n");
> + }
> + pos += pkt_len;
> + buf += pkt_len;
> + len -= pkt_len;
> + }
> +}
> +
> +static void arm_spe_dump_event(struct arm_spe *spe, unsigned char *buf,
> + size_t len)
> +{
> + printf(".\n");
> + arm_spe_dump(spe, buf, len);
> +}
> +
> +static struct arm_spe_queue *arm_spe_alloc_queue(struct arm_spe *spe,
> + unsigned int queue_nr)
> +{
> + struct arm_spe_queue *speq;
> +
> + speq = zalloc(sizeof(struct arm_spe_queue));
> + if (!speq)
> + return NULL;
> +
> + speq->spe = spe;
> + speq->queue_nr = queue_nr;
> + speq->pid = -1;
> + speq->tid = -1;
> + speq->cpu = -1;
> +
> + return speq;
> +}
> +
> +static int arm_spe_setup_queue(struct arm_spe *spe,
> + struct auxtrace_queue *queue,
> + unsigned int queue_nr)
> +{
> + struct arm_spe_queue *speq = queue->priv;
> +
> + if (list_empty(&queue->head))
> + return 0;
> +
> + if (!speq) {
> + speq = arm_spe_alloc_queue(spe, queue_nr);
> + if (!speq)
> + return -ENOMEM;
> + queue->priv = speq;
> +
> + if (queue->cpu != -1)
> + speq->cpu = queue->cpu;
> + speq->tid = queue->tid;
> + }
> +
> + if (!speq->on_heap && !speq->buffer) {
> + int ret;
> +
> + speq->buffer = auxtrace_buffer__next(queue, NULL);
> + if (!speq->buffer)
> + return 0;
> +
> + ret = auxtrace_heap__add(&spe->heap, queue_nr,
> + speq->buffer->reference);
> + if (ret)
> + return ret;
> + speq->on_heap = true;
> + }
> +
> + return 0;
> +}
> +
> +static int arm_spe_setup_queues(struct arm_spe *spe)
> +{
> + unsigned int i;
> + int ret;
> +
> + for (i = 0; i < spe->queues.nr_queues; i++) {
> + ret = arm_spe_setup_queue(spe, &spe->queues.queue_array[i],
> + i);
> + if (ret)
> + return ret;
> + }
> + return 0;
> +}
> +
> +static inline int arm_spe_update_queues(struct arm_spe *spe)
> +{
> + if (spe->queues.new_data) {
> + spe->queues.new_data = false;
> + return arm_spe_setup_queues(spe);
> + }
> + return 0;
> +}
> +
> +static int arm_spe_process_event(struct perf_session *session __maybe_unused,
> + union perf_event *event __maybe_unused,
> + struct perf_sample *sample __maybe_unused,
> + struct perf_tool *tool __maybe_unused)
> +{
> + return 0;
> +}
> +
> +static int arm_spe_process_auxtrace_event(struct perf_session *session,
> + union perf_event *event,
> + struct perf_tool *tool __maybe_unused)
> +{
> + struct arm_spe *spe = container_of(session->auxtrace, struct arm_spe,
> + auxtrace);
> + struct auxtrace_buffer *buffer;
> + off_t data_offset;
> + int fd = perf_data__fd(session->data);
> + int err;
> +
> + if (perf_data__is_pipe(session->data)) {
> + data_offset = 0;
> + } else {
> + data_offset = lseek(fd, 0, SEEK_CUR);
> + if (data_offset == -1)
> + return -errno;
> + }
> +
> + err = auxtrace_queues__add_event(&spe->queues, session, event,
> + data_offset, &buffer);
> + if (err)
> + return err;
> +
> + /* Dump here now we have copied a piped trace out of the pipe */
> + if (dump_trace) {
> + if (auxtrace_buffer__get_data(buffer, fd)) {
> + arm_spe_dump_event(spe, buffer->data,
> + buffer->size);
> + auxtrace_buffer__put_data(buffer);
> + }
> + }
> +
> + return 0;
> +}
> +
> +static int arm_spe_flush(struct perf_session *session __maybe_unused,
> + struct perf_tool *tool __maybe_unused)
> +{
> + return 0;
> +}
> +
> +static void arm_spe_free_queue(void *priv)
> +{
> + struct arm_spe_queue *speq = priv;
> +
> + if (!speq)
> + return;
> + free(speq);
> +}
> +
> +static void arm_spe_free_events(struct perf_session *session)
> +{
> + struct arm_spe *spe = container_of(session->auxtrace, struct arm_spe,
> + auxtrace);
> + struct auxtrace_queues *queues = &spe->queues;
> + unsigned int i;
> +
> + for (i = 0; i < queues->nr_queues; i++) {
> + arm_spe_free_queue(queues->queue_array[i].priv);
> + queues->queue_array[i].priv = NULL;
> + }
> + auxtrace_queues__free(queues);
> +}
> +
> +static void arm_spe_free(struct perf_session *session)
> +{
> + struct arm_spe *spe = container_of(session->auxtrace, struct arm_spe,
> + auxtrace);
> +
> + auxtrace_heap__free(&spe->heap);
> + arm_spe_free_events(session);
> + session->auxtrace = NULL;
> + free(spe);
> +}
> +
> +static const char * const arm_spe_info_fmts[] = {
> + [ARM_SPE_PMU_TYPE] = " PMU Type %"PRId64"\n",
> +};
> +
> +static void arm_spe_print_info(u64 *arr)
> +{
> + if (!dump_trace)
> + return;
> +
> + fprintf(stdout, arm_spe_info_fmts[ARM_SPE_PMU_TYPE], arr[ARM_SPE_PMU_TYPE]);
> +}
> +
> +int arm_spe_process_auxtrace_info(union perf_event *event,
> + struct perf_session *session)
> +{
> + struct auxtrace_info_event *auxtrace_info = &event->auxtrace_info;
> + size_t min_sz = sizeof(u64) * ARM_SPE_PMU_TYPE;
> + struct arm_spe *spe;
> + int err;
> +
> + if (auxtrace_info->header.size < sizeof(struct auxtrace_info_event) +
> + min_sz)
> + return -EINVAL;
> +
> + spe = zalloc(sizeof(struct arm_spe));
> + if (!spe)
> + return -ENOMEM;
> +
> + err = auxtrace_queues__init(&spe->queues);
> + if (err)
> + goto err_free;
> +
> + spe->session = session;
> + spe->machine = &session->machines.host; /* No kvm support */
> + spe->auxtrace_type = auxtrace_info->type;
> + spe->pmu_type = auxtrace_info->priv[ARM_SPE_PMU_TYPE];
> +
> + spe->auxtrace.process_event = arm_spe_process_event;
> + spe->auxtrace.process_auxtrace_event = arm_spe_process_auxtrace_event;
> + spe->auxtrace.flush_events = arm_spe_flush;
> + spe->auxtrace.free_events = arm_spe_free_events;
> + spe->auxtrace.free = arm_spe_free;
> + session->auxtrace = &spe->auxtrace;
> +
> + arm_spe_print_info(&auxtrace_info->priv[0]);
> +
> + return 0;
> +
> +err_free:
> + free(spe);
> + return err;
> +}
> diff --git a/tools/perf/util/arm-spe.h b/tools/perf/util/arm-spe.h
> new file mode 100644
> index 000000000000..80752b20d850
> --- /dev/null
> +++ b/tools/perf/util/arm-spe.h
> @@ -0,0 +1,42 @@
> +/*
> + * ARM Statistical Profiling Extensions (SPE) support
> + * Copyright (c) 2017, ARM Ltd.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
> + * more details.
> + *
> + */
> +
> +#ifndef INCLUDE__PERF_ARM_SPE_H__
> +#define INCLUDE__PERF_ARM_SPE_H__
> +
> +#define ARM_SPE_PMU_NAME "arm_spe_"
> +
> +enum {
> + ARM_SPE_PMU_TYPE,
> + ARM_SPE_PER_CPU_MMAPS,
> + ARM_SPE_AUXTRACE_PRIV_MAX,
> +};
> +
> +#define ARM_SPE_AUXTRACE_PRIV_SIZE (ARM_SPE_AUXTRACE_PRIV_MAX * sizeof(u64))
> +
> +struct auxtrace_record;
> +struct perf_tool;

struct auxtrace_record and struct perf_tool are not used.

> +union perf_event;
> +struct perf_session;
> +struct perf_pmu;
> +
> +struct auxtrace_record *arm_spe_recording_init(int *err,
> + struct perf_pmu *arm_spe_pmu);
> +
> +int arm_spe_process_auxtrace_info(union perf_event *event,
> + struct perf_session *session);
> +
> +struct perf_event_attr *arm_spe_pmu_default_config(struct perf_pmu *arm_spe_pmu);
> +#endif
> diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
> index a33491416400..f682f7a58a02 100644
> --- a/tools/perf/util/auxtrace.c
> +++ b/tools/perf/util/auxtrace.c
> @@ -57,6 +57,7 @@
>
> #include "intel-pt.h"
> #include "intel-bts.h"
> +#include "arm-spe.h"
>
> #include "sane_ctype.h"
> #include "symbol/kallsyms.h"
> @@ -913,6 +914,8 @@ int perf_event__process_auxtrace_info(struct perf_tool *tool __maybe_unused,
> return intel_pt_process_auxtrace_info(event, session);
> case PERF_AUXTRACE_INTEL_BTS:
> return intel_bts_process_auxtrace_info(event, session);
> + case PERF_AUXTRACE_ARM_SPE:
> + return arm_spe_process_auxtrace_info(event, session);
> case PERF_AUXTRACE_CS_ETM:
> case PERF_AUXTRACE_UNKNOWN:
> default:
> diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
> index d19e11b68de7..453c148d2158 100644
> --- a/tools/perf/util/auxtrace.h
> +++ b/tools/perf/util/auxtrace.h
> @@ -43,6 +43,7 @@ enum auxtrace_type {
> PERF_AUXTRACE_INTEL_PT,
> PERF_AUXTRACE_INTEL_BTS,
> PERF_AUXTRACE_CS_ETM,
> + PERF_AUXTRACE_ARM_SPE,
> };
>
> enum itrace_period_type {
>