[PATCH v6 1/8] perf evsel: Set off-cpu BPF output to system-wide

From: Howard Chu
Date: Fri Sep 27 2024 - 16:28:06 EST


pid = -1 for off-cpu's bpf-output event.

This makes 'perf record -p <PID> --off-cpu', and 'perf record --off-cpu
<workload>' work. Otherwise bpf-output cannot be collected.

The reason (conjecture): say if we open perf_event on pid = 11451, then
in BPF, we call bpf_perf_event_output() when a direct sample is ready to
be dumped. But currently the perf_event of pid 11451 is not __fully__
sched_in yet, so in kernel/trace/bpf_trace.c's
__bpf_perf_event_output(), there will be event->oncpu != cpu, thus
return -EOPNOTSUPP, direct off-cpu sample output failed.

if (unlikely(event->oncpu != cpu))
return -EOPNOTSUPP;

So I'm making it pid = -1, everybody can do bpf_perf_event_output()

P.S. In perf trace this is not necessary, because it uses syscall
tracepoints, instead of sched_switch.

Signed-off-by: Howard Chu <howardchu95@xxxxxxxxx>
---
tools/perf/util/evsel.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index edfb376f0611..500ca62669cb 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -2368,6 +2368,9 @@ static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,

test_attr__ready();

+ if (evsel__is_offcpu_event(evsel))
+ pid = -1;
+
/* Debug message used by test scripts */
pr_debug2_peo("sys_perf_event_open: pid %d cpu %d group_fd %d flags %#lx",
pid, perf_cpu_map__cpu(cpus, idx).cpu, group_fd, evsel->open_flags);
--
2.43.0