Re: [PATCH v4 5/6] perf vendor events arm64: Update stall_slot workaround for N2 r0p3
From: James Clark
Date: Wed Aug 09 2023 - 09:06:41 EST
On 08/08/2023 11:18, John Garry wrote:
> On 07/08/2023 15:20, James Clark wrote:
>
> Hi James,
>
> Thanks for the effort in doing this.
>
>> N2 r0p3 doesn't require the workaround [1], so gating on (#slots - 5) no
>> longer works because all N2s have 5 slots. Add a new expression builtin
>> that identifies the need for the workaround correctly.
>>
>> [1]:
>> https://urldefense.com/v3/__https://gitlab.arm.com/telemetry-solution/telemetry-solution/-/blob/main/data/pmu/cpu/neoverse/neoverse-n2-r0p3.json__;!!ACWV5N9M2RV99hQ!Nx1xgWXXS9GBasNpOKQXHWKe8VwpRB0h8lAfOmwUmkkOQTjFqn2NswO8ti8vTcblUfAYN9NAtxqAh-sf0TEOvQ$
>> Signed-off-by: James Clark <james.clark@xxxxxxx>
>> ---
>> tools/perf/arch/arm64/util/pmu.c | 21 +++++++++++++++++++
>> .../arm64/arm/neoverse-n2-v2/metrics.json | 8 +++----
>> tools/perf/util/expr.c | 4 ++++
>> tools/perf/util/pmu.c | 6 ++++++
>> tools/perf/util/pmu.h | 1 +
>> 5 files changed, 36 insertions(+), 4 deletions(-)
>>
>> diff --git a/tools/perf/arch/arm64/util/pmu.c
>> b/tools/perf/arch/arm64/util/pmu.c
>> index 561de0cb6b95..30e2385a83cf 100644
>> --- a/tools/perf/arch/arm64/util/pmu.c
>> +++ b/tools/perf/arch/arm64/util/pmu.c
>> @@ -2,6 +2,7 @@
>> #include <internal/cpumap.h>
>> #include "../../../util/cpumap.h"
>> +#include "../../../util/header.h"
>> #include "../../../util/pmu.h"
>> #include "../../../util/pmus.h"
>> #include <api/fs/fs.h>
>> @@ -62,3 +63,23 @@ double perf_pmu__cpu_slots_per_cycle(void)
>> return slots ? (double)slots : NAN;
>> }
>> +
>> +double perf_pmu__no_stall_errata(void)
>
> While I like the approach of encoding the per-CPU support in the metric
> expression, I find that literal "no stall errata" is vague and could
> apply to any "stall errata" for any SoC for any architecture.
>
> If we were going to do it this way, then we would need a more specific
> name, like perf_pmu__no_stall_errata_arm64_errata123456(), but that
> should not be in the core perf code.
>
> Could we instead add a function to check cpuid and have something like
> this in the JSON:
>
> "MetricExpr": "(op_retired / op_spec) * (1 - (stall_slot if
> (cpuid_less_than(410fd493)) else (stall_slot - cpu_cycles)) / (#slots *
> cpu_cycles))"
>
> I'm currently figuring out how cpuid_less_than() would be implemented
> (I'm no great python wrangler), but it would be along the lines of what
> Ian added for "has_event" in
> https://lore.kernel.org/linux-perf-users/20230623151016.4193660-1-irogers@xxxxxxxxxx/
>
> Thanks,
> John
Yeah it looks like it could be done that way. Also, the way I added it,
it doesn't have access to the PMU type, it just does a generic
pmu__find_core_pmu() so won't work very well for heterogeneous systems.
If we're going to do a deeper modification of the expression parser like
with has_event() it might be possible to pass in the actual CPU ID that
the metric is running on which would be better.
I'll have a look.
James