[tip:perf/urgent] perf bpf: Automatically add BTF ELF markers

From: tip-bot for Arnaldo Carvalho de Melo
Date: Sat Mar 09 2019 - 14:58:14 EST


Commit-ID: 3163613c5bc805dadac8ea157648eefd46747cae
Gitweb: https://git.kernel.org/tip/3163613c5bc805dadac8ea157648eefd46747cae
Author: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx>
AuthorDate: Fri, 1 Mar 2019 16:09:31 -0300
Committer: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx>
CommitDate: Wed, 6 Mar 2019 09:45:37 -0300

perf bpf: Automatically add BTF ELF markers

The libbpf loader expects that some __btf_map_<MAP_NAME> structs be in
place with the keys and values types of maps so that one can store the
struct definitions and have them sent to the kernel via sys_bpf(fd, cmd
= BTF_LOAD) and then later be retrievable via sys_bpf(fd, cmd =
BPF_OBJ_GET_INFO_BY_FD) for use by tools such as 'bpftool map dump id
MAP_ID'.

Since we already have this for defining maps in 'perf trace' BPF events:

bpf_map(name, _type, type_key, type_val, _max_entries)

As used in the tools/perf/examples/bpf/augmented_raw_syscalls.c:

--- 8< ---

struct syscall {
bool enabled;
};

bpf_map(syscalls, ARRAY, int, struct syscall, 512);

--- 8< ---

All we need is to get all that already available info, piggyback on the
'bpf_map' define in tools/perf/include/bpf/bpf.h, that is included by
'perf trace' BPF programs and do that without requiring changes to the
BPF programs already defining maps using 'bpf_map()'.

So this is what we have before this patch:

1) With this in ~/.perfconfig to dump .c events as .o, aka save a copy
so that we can use the .o later as a pre-compiled BPF bytecode:

# grep '\[llvm\]' -A2 ~/.perfconfig
[llvm]
dump-obj = true
clang-opt = -g

#
# clang --version
clang version 9.0.0 (https://git.llvm.org/git/clang.git/ 7906282d3afec5dfdc2b27943fd6c0309086c507) (https://git.llvm.org/git/llvm.git/ a1b5de1ff8ae8bc79dc8e86e1f82565229bd0500)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/llvm/bin

2) Note the -g there so that we get clang to generate debuginfo, and
since the target is 'bpf' it will generate the BTF info in this
clang version (9.0).

3) Run a simple 'perf record' specifiying as an event the augmented_raw_syscalls.c
source code:

# perf record -e /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.c sleep 1
LLVM: dumping /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.025 MB perf.data ]

# file /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o: ELF 64-bit LSB relocatable, eBPF, version 1 (SYSV), with debug_info, not stripped

4) Look at the BTF structs encoded in it:

# pahole -F btf --sizes /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
syscall_enter_args 64 0
augmented_filename 264 0
syscall 1 0
syscall_exit_args 24 0
bpf_map 28 0
#
# pahole -F btf -C syscalls /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
# pahole -F btf -C syscall /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
struct syscall {
bool enabled; /* 0 1 */

/* size: 1, cachelines: 1, members: 1 */
/* last cacheline: 1 bytes */
};
#

5) Ok, with just this we don't have the markers expected by the libbpf
loader and when we run with this BPF bytecode, because we have:

# grep '\[trace\]' -A1 ~/.perfconfig
[trace]
add_events = /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
#

6) Lets do a 'perf trace' system wide session using this BPF program:

# perf trace -e *mmsg,open*
Cache2 I/O/6885 openat(AT_FDCWD, "/home/acme/.cache/mozilla/firefox/ina67tev.default/cache2/entries/BA220AB2914006A7AE96D27BE6EA13DD77519FCA", O_RDWR|O_CREAT|O_TRUNC, S_IRUSR|S_IWUSR) = 106
Cache2 I/O/6885 openat(AT_FDCWD, "/proc/self/mountinfo", O_RDONLY) = 121
Cache2 I/O/6885 openat(AT_FDCWD, "/proc/self/mountinfo", O_RDONLY) = 121
Cache2 I/O/6885 openat(AT_FDCWD, "/proc/self/mountinfo", O_RDONLY) = 121
Cache2 I/O/6885 openat(AT_FDCWD, "/proc/self/mountinfo", O_RDONLY) = 121
DNS Res~ver #3/23340 openat(AT_FDCWD, "/etc/hosts", O_RDONLY|O_CLOEXEC) = 106
DNS Res~ver #3/23340 sendmmsg(106<socket:[3482690]>, 0x7f252f1fcaf0, 2, MSG_NOSIGNAL) = 2
Cache2 I/O/6885 openat(AT_FDCWD, "/home/acme/.cache/mozilla/firefox/ina67tev.default/cache2/entries/BA220AB2914006A7AE96D27BE6EA13DD77519FCA", O_RDWR) = 106
lighttpd/18915 openat(AT_FDCWD, "/proc/loadavg", O_RDONLY) = 12

7) While it runs lets see the maps that 'perf trace' + libbpf's BPF
loader loaded into the kernel via sys_bpf(fd, BPF_BTF_LOAD, ...):

# bpftool map list | tail -6
149: perf_event_array name __augmented_sys flags 0x0
key 4B value 4B max_entries 8 memlock 4096B
150: array name syscalls flags 0x0
key 4B value 1B max_entries 512 memlock 8192B
151: hash name pids_filtered flags 0x0
key 4B value 1B max_entries 64 memlock 8192B
#

8) Dump the "pids_filtered", map, that will have one entry per PID that
'perf trace' wants filtered, which includes its own, to avoid a
tracing feedback loop (perf trace shows the syscalls it does which
generates more syscalls that it has to show that...), it also
auto-filters the 'gnome-terminal' and 'sshd' parent PIDs, for the
same reason:

# bpftool map dump id 151
key: a5 0c 00 00 value: 01
key: 14 63 00 00 value: 01
Found 2 elements
#

9) Since there is no BTF info available, it does a generic hex dump :-\

10) Now, with this patch applied, we'll do steps 3 to 6 again and look
with pahole if there are extra structs encoded in BTF:

# pahole -F btf --sizes /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
syscall_enter_args 64 0
augmented_filename 264 0
syscall 1 0
syscall_exit_args 24 0
bpf_map 28 0
____btf_map___augmented_syscalls__ 8 0
____btf_map_syscalls 8 0
____btf_map_pids_filtered 8 0
#

11) Yes, those __btf_map_ + the map names, lets see how they look like:

# pahole -F btf -C ____btf_map_syscalls /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
struct ____btf_map_syscalls {
int key; /* 0 4 */
struct syscall value; /* 4 1 */

/* size: 8, cachelines: 1, members: 2 */
/* padding: 3 */
/* last cacheline: 8 bytes */
};
#

12) Lets repeat step 7 to get the new map ids:

# bpftool map list | tail -6
155: perf_event_array name __augmented_sys flags 0x0
key 4B value 4B max_entries 8 memlock 4096B
156: array name syscalls flags 0x0
key 4B value 1B max_entries 512 memlock 8192B
157: hash name pids_filtered flags 0x0
key 4B value 1B max_entries 64 memlock 8192B
#

13) And finally lets dump the 'pids_filtered':

# bpftool map dump id 157
[{
"key": 3237,
"value": true
},{
"key": 26435,
"value": true
}
]
#

Looks much better! BTF info was used to interpret the key as an integer
and the value as a struct with just one boolean member, so to make it
more compact, show just the 'true' value where we saw '01'.

Now to make 'perf trace --dump-map' to use BTF!

Cc: Adrian Hunter <adrian.hunter@xxxxxxxxx>
Cc: Alexei Starovoitov <ast@xxxxxx>
Cc: Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx>
Cc: Daniel Borkmann <daniel@xxxxxxxxxxxxx>
Cc: Jiri Olsa <jolsa@xxxxxxxxxx>
Cc: Luis ClÃudio GonÃalves <lclaudio@xxxxxxxxxx>
Cc: Martin KaFai Lau <kafai@xxxxxx>
Cc: Namhyung Kim <namhyung@xxxxxxxxxx>
Cc: Song Liu <songliubraving@xxxxxx>
Cc: Wang Nan <wangnan0@xxxxxxxxxx>
Cc: Yonghong Song <yhs@xxxxxx>
Link: https://lkml.kernel.org/n/tip-ybuf9wpkm30xk28iq7jbwb40@xxxxxxxxxxxxxx
Signed-off-by: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx>
---
tools/perf/include/bpf/bpf.h | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/tools/perf/include/bpf/bpf.h b/tools/perf/include/bpf/bpf.h
index 5df7ed9d9020..2eac6d804b2d 100644
--- a/tools/perf/include/bpf/bpf.h
+++ b/tools/perf/include/bpf/bpf.h
@@ -24,7 +24,13 @@ struct bpf_map SEC("maps") name = { \
.key_size = sizeof(type_key), \
.value_size = sizeof(type_val), \
.max_entries = _max_entries, \
-}
+}; \
+struct ____btf_map_##name { \
+ type_key key; \
+ type_val value; \
+}; \
+struct ____btf_map_##name __attribute__((section(".maps." #name), used)) \
+ ____btf_map_##name = { }

/*
* FIXME: this should receive .max_entries as a parameter, as careful