Re: [PATCH v6 10/12] perf tools: Improve IBS error handling

From: Stephane Eranian
Date: Mon Mar 14 2022 - 22:02:07 EST


Hi Kim,

On Tue, Feb 8, 2022 at 1:17 PM Stephane Eranian <eranian@xxxxxxxxxx> wrote:
>
> From: Kim Phillips <kim.phillips@xxxxxxx>
>
> improve the error message returned on failed perf_event_open() on AMD when
> using IBS.
>
> Output of executing 'perf record -e ibs_op// true' BEFORE this patch:
>
> The sys_perf_event_open() syscall returned with 22 (Invalid argument)for event (ibs_op//u).
> /bin/dmesg | grep -i perf may provide additional information.
>
> Output after:
>
> AMD IBS cannot exclude kernel events. Try running at a higher privilege level.
>
> Output of executing 'sudo perf record -e ibs_op// true' BEFORE this patch:
>
> Error:
> The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (ibs_op//).
> /bin/dmesg | grep -i perf may provide additional information.
>
> Output after:
>
> Error:
> AMD IBS may only be available in system-wide/per-cpu mode. Try using -a, or -C and workload affinity
>
> Signed-off-by: Kim Phillips <kim.phillips@xxxxxxx>
> Cc: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx>
> Cc: Alexander Shishkin <alexander.shishkin@xxxxxxxxxxxxxxx>
> Cc: Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>
> Cc: Ian Rogers <irogers@xxxxxxxxxx>
> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> Cc: Jiri Olsa <jolsa@xxxxxxxxxx>
> Cc: Joao Martins <joao.m.martins@xxxxxxxxxx>
> Cc: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
> Cc: Mark Rutland <mark.rutland@xxxxxxx>
> Cc: Michael Petlan <mpetlan@xxxxxxxxxx>
> Cc: Namhyung Kim <namhyung@xxxxxxxxxx>
> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Cc: Robert Richter <robert.richter@xxxxxxx>
> Cc: Stephane Eranian <eranian@xxxxxxxxxx>
> ---
> tools/perf/util/evsel.c | 24 ++++++++++++++++++++++++
> 1 file changed, 24 insertions(+)
>
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index 22d3267ce294..d42f63a484df 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -2847,9 +2847,22 @@ static bool find_process(const char *name)
> return ret ? false : true;
> }
>
> +static bool is_amd(const char *arch, const char *cpuid)
> +{
> + return arch && !strcmp("x86", arch) && cpuid && strstarts(cpuid, "AuthenticAMD");
> +}
> +
> +static bool is_amd_ibs(struct evsel *evsel)
> +{
> + return evsel->core.attr.precise_ip || !strncmp(evsel->pmu_name, "ibs", 3);
> +}
> +
> int evsel__open_strerror(struct evsel *evsel, struct target *target,
> int err, char *msg, size_t size)
> {
> + struct perf_env *env = evsel__env(evsel);
> + const char *arch = perf_env__arch(env);
> + const char *cpuid = perf_env__cpuid(env);


This code dies for me on the latest tip.git because env = NULL and
perf_env_cpuid() is broken for NULL argument.
I don't quite know where this env global variable is set but I hope
there is a better way of doing this, maybe using
the evsel__env() function in the same util/evsel.c file.

Similarly, the is_amd_ibs() suffers from a NULL pointer dereference
because evsel->pmu_name maybe NULL:

$ perf record -e rc2 .....

causes a NULL pmu_name.

Could you please send me an updated version to integrate with the
branch sampling code?

Thanks.


>
> char sbuf[STRERR_BUFSIZE];
> int printed = 0, enforced = 0;
>
> @@ -2949,6 +2962,17 @@ int evsel__open_strerror(struct evsel *evsel, struct target *target,
> return scnprintf(msg, size,
> "Invalid event (%s) in per-thread mode, enable system wide with '-a'.",
> evsel__name(evsel));
> + if (is_amd(arch, cpuid)) {
> + if (is_amd_ibs(evsel)) {
> + if (evsel->core.attr.exclude_kernel)
> + return scnprintf(msg, size,
> + "AMD IBS can't exclude kernel events. Try running at a higher privilege level.");
> + if (!evsel->core.system_wide)
> + return scnprintf(msg, size,
> + "AMD IBS may only be available in system-wide/per-cpu mode. Try using -a, or -C and workload affinity");
> + }
> + }
> +
> break;
> case ENODATA:
> return scnprintf(msg, size, "Cannot collect data source with the load latency event alone. "
> --
> 2.35.0.263.gb82422642f-goog
>