[PATCH v1 3/3] perf trace: Fix perf trace -p <PID>

From: Howard Chu
Date: Wed Jul 31 2024 - 15:50:51 EST


perf trace -p <PID> doesn't work on a syscall that's augmented(when it
calls perf_event_output() in BPF). However, it does work when the
syscall is unaugmented.

Let's take open() as an example. open() is augmented in perf trace.

Before:
```
perf $ perf trace -e open -p 3792392
? ( ): ... [continued]: open()) = -1 ENOENT (No such file or directory)
? ( ): ... [continued]: open()) = -1 ENOENT (No such file or directory)
```

We can see there's no output.

After:
```
perf $ perf trace -e open -p 3792392
0.000 ( 0.123 ms): a.out/3792392 open(filename: "DINGZHEN", flags: WRONLY) = -1 ENOENT (No such file or directory)
1000.398 ( 0.116 ms): a.out/3792392 open(filename: "DINGZHEN", flags: WRONLY) = -1 ENOENT (No such file or directory)
```

Reason:

bpf_perf_event_output() will fail when you specify a pid in perf trace.

When using perf trace -p 114, before perf_event_open(), we'll have PID
= 114, and CPU = -1.

This is bad for bpf-output event, because it doesn't accept output from
BPF's perf_event_output(), making it fail.

What is ideal is to make the PID = -1, everytime we need to open a
bpf-output event. But PID = -1, and CPU = -1 is illegal.

So we have to open bpf-output for every cpu, that is:
PID = -1, CPU = 0
PID = -1, CPU = 1
PID = -1, CPU = 2
PID = -1, CPU = 3
...

This patch does just that.

You can test it with this script:
```
#include <unistd.h>
#include <sys/syscall.h>

int main()
{
int i1 = 1, i2 = 2, i3 = 3, i4 = 4;
char s1[] = "DINGZHEN", s2[] = "XUEBAO";

while (1) {
syscall(SYS_open, s1, i1, i2);
sleep(1);
}

return 0;
}
```

save, compile, run, get the pid
```
gcc open.c

./a.out

# in a different window
ps aux | grep a.out
```

perf trace
```
perf trace -p <PID-You-just-got> -e open
```

!!Note that perf trace <Workload> is a little broken after this pid
fix, so you can't do 'perf trace -e open ./a.out', please get pid by
hand.

Signed-off-by: Howard Chu <howardchu95@xxxxxxxxx>
---
tools/perf/util/evlist.c | 14 +++++++++++++-
tools/perf/util/evlist.h | 1 +
tools/perf/util/evsel.c | 3 +++
3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 3a719edafc7a..d32f4f399ddd 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -1063,7 +1063,7 @@ int evlist__create_maps(struct evlist *evlist, struct target *target)
if (!threads)
return -1;

- if (target__uses_dummy_map(target))
+ if (target__uses_dummy_map(target) && !evlist__has_bpf_output(evlist))
cpus = perf_cpu_map__new_any_cpu();
else
cpus = perf_cpu_map__new(target->cpu_list);
@@ -2556,3 +2556,15 @@ void evlist__uniquify_name(struct evlist *evlist)
}
}
}
+
+bool evlist__has_bpf_output(struct evlist *evlist)
+{
+ struct evsel *evsel;
+
+ evlist__for_each_entry(evlist, evsel) {
+ if (evsel__is_bpf_output(evsel))
+ return true;
+ }
+
+ return false;
+}
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index cb91dc9117a2..09a6114daf8b 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -443,5 +443,6 @@ int evlist__scnprintf_evsels(struct evlist *evlist, size_t size, char *bf);
void evlist__check_mem_load_aux(struct evlist *evlist);
void evlist__warn_user_requested_cpus(struct evlist *evlist, const char *cpu_list);
void evlist__uniquify_name(struct evlist *evlist);
+bool evlist__has_bpf_output(struct evlist *evlist);

#endif /* __PERF_EVLIST_H */
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index bc603193c477..0531efdf54e2 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -2282,6 +2282,9 @@ static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,

test_attr__ready();

+ if (evsel__is_bpf_output(evsel))
+ pid = -1;
+
/* Debug message used by test scripts */
pr_debug2_peo("sys_perf_event_open: pid %d cpu %d group_fd %d flags %#lx",
pid, perf_cpu_map__cpu(cpus, idx).cpu, group_fd, evsel->open_flags);
--
2.45.2