Re: [PATCH v1] samples/bpf: Add a trace tool with perf PMU counters

From: Daniel Borkmann
Date: Mon Jan 20 2025 - 11:18:53 EST


Hi Leo,

On 1/19/25 4:33 PM, Leo Yan wrote:
Developers might need to profile a program with fine-grained
granularity. E.g., a user case is to account the CPU cycles for a small
program or for a specific function within the program.

This commit introduces a small tool with using eBPF program to read the
perf PMU counters for performance metrics. As the first step, the four
counters are supported with the '-e' option: cycles, instructions,
branches, branch-misses.

The '-r' option is provided for support raw event number. This option
is mutually exclusive to the '-e' option, users either pass a raw event
number or a counter name.

The tool enables the counters for the entire trace session in free-run
mode. It reads the beginning values for counters when the profiled
program is scheduled in, and calculate the interval when the task is
scheduled out. The advantage of this approach is to dismiss the
statistics noise (e.g. caused by the tool itself) as possible.

The tool can support function based tracing. By using the '-f' option,
users can specify the traced function. The eBPF program enables tracing
at the function entry and disables trace upon exit from the function.

The '-u' option can be specified for tracing user mode only.

Below are several usage examples.

Trace CPU cycles for the whole program:

# ./trace_counter -e cycles -- /mnt/sort
Or
# ./trace_counter -e cycles /mnt/sort
Create process for the workload.
Enable the event cycles.
Bubble sorting array of 3000 elements
551 ms
Finished the workload.
Event (cycles) statistics:
+-----------+------------------+
| CPU[0000] | 29093250 |
+-----------+------------------+
| CPU[0002] | 75672820 |
+-----------+------------------+
| CPU[0006] | 1067458735 |
+-----------+------------------+
Total : 1172224805

Trace branches for the user mode only:

# ./trace_counter -e branches -u -- /mnt/sort
Create process for the workload.
Enable the event branches.
Bubble sorting array of 3000 elements
541 ms
Finished the workload.
Event (branches) statistics:
+-----------+------------------+
| CPU[0007] | 88112669 |
+-----------+------------------+
Total : 88112669

Trace instructions for the 'bubble_sort' function:

# ./trace_counter -f bubble_sort -e instructions -- /mnt/sort
Create process for the workload.
Enable the event instructions.
Bubble sorting array of 3000 elements
541 ms
Finished the workload.
Event (instructions) statistics:
+-----------+------------------+
| CPU[0006] | 1169810201 |
+-----------+------------------+
Total : 1169810201
Function (bubble_sort) duration statistics:
Count : 5
Minimum : 232009928
Maximum : 236742006
Average : 233962040

Trace the raw event '0x5' (L1D_TLB_REFILL):

# ./trace_counter -r 0x5 -u -- /mnt/sort
Create process for the workload.
Enable the raw event 0x5.
Bubble sorting array of 3000 elements
540 ms
Finished the workload.
Event (0x5) statistics:
+-----------+------------------+
| CPU[0007] | 174 |
+-----------+------------------+
Total : 174

Trace for the function and set CPU affinity for the profiled program:

# ./trace_counter -f bubble_sort -x /mnt/sort -e cycles \
-- taskset -c 2 /mnt/sort
Create process for the workload.
Enable the event cycles.
Bubble sorting array of 3000 elements
619 ms
Finished the workload.
Event (cycles) statistics:
+-----------+------------------+
| CPU[0002] | 1169913056 |
+-----------+------------------+
Total : 1169913056
Function (bubble_sort) duration statistics:
Count : 5
Minimum : 232054101
Maximum : 236769623
Average : 233982611

The command above sets the CPU affinity with taskset command. The
profiled function 'bubble_sort' is in the executable '/mnt/sort' but not
in the taskset binary. The '-x' option is used to tell the tool the
correct executable path.

Signed-off-by: Leo Yan <leo.yan@xxxxxxx>
---
samples/bpf/Makefile | 7 +-
samples/bpf/trace_counter.bpf.c | 222 +++++++++++++
samples/bpf/trace_counter_user.c | 528 +++++++++++++++++++++++++++++++
3 files changed, 756 insertions(+), 1 deletion(-)
create mode 100644 samples/bpf/trace_counter.bpf.c
create mode 100644 samples/bpf/trace_counter_user.c

Thanks for this work! Few suggestions.. the contents of samples/bpf/ are in process of being
migrated into BPF selftests given they have been bit-rotting for quite some time, so we'd like
to migrate missing coverage into BPF CI (see test_progs in tools/testing/selftests/bpf/). That
could be one option, or an alternative is to extend bpftool for profiling BPF programs (see
47c09d6a9f67 ("bpftool: Introduce "prog profile" command")).