Re: [PATCH v4 1/2] perf trace: Implement syscall summary in BPF

From: Howard Chu
Date: Fri Mar 28 2025 - 21:47:00 EST


Hello Namhyung,

On Tue, Mar 25, 2025 at 9:40 PM Namhyung Kim <namhyung@xxxxxxxxxx> wrote:
>
> When -s/--summary option is used, it doesn't need (augmented) arguments
> of syscalls. Let's skip the augmentation and load another small BPF
> program to collect the statistics in the kernel instead of copying the
> data to the ring-buffer to calculate the stats in userspace. This will
> be much more light-weight than the existing approach and remove any lost
> events.
>
> Let's add a new option --bpf-summary to control this behavior. I cannot
> make it default because there's no way to get e_machine in the BPF which
> is needed for detecting different ABIs like 32-bit compat mode.
>
> No functional changes intended except for no more LOST events. :)
>
> $ sudo ./perf trace -as --summary-mode=total --bpf-summary sleep 1
>
> Summary of events:
>
> total, 6194 events
>
> syscall calls errors total min avg max stddev
> (msec) (msec) (msec) (msec) (%)
> --------------- -------- ------ -------- --------- --------- --------- ------
> epoll_wait 561 0 4530.843 0.000 8.076 520.941 18.75%
> futex 693 45 4317.231 0.000 6.230 500.077 21.98%
> poll 300 0 1040.109 0.000 3.467 120.928 17.02%
> clock_nanosleep 1 0 1000.172 1000.172 1000.172 1000.172 0.00%
> ppoll 360 0 872.386 0.001 2.423 253.275 41.91%
> epoll_pwait 14 0 384.349 0.001 27.453 380.002 98.79%
> pselect6 14 0 108.130 7.198 7.724 8.206 0.85%
> nanosleep 39 0 43.378 0.069 1.112 10.084 44.23%
> ...
>
> Cc: Howard Chu <howardchu95@xxxxxxxxx>
> Signed-off-by: Namhyung Kim <namhyung@xxxxxxxxxx>
> ---
> v4)
> * fix segfault on -S (Howard)
> * correct some comments (Howard)

+ if (!hashmap__find(hash, map_key->nr, &data)) {

I think you should mention the hashmap's map_key->nr update, as this
change is actually important for the feature.

>
> v3)
> * support -S/--with-summary option too (Howard)
> * make it work only with -a/--all-cpus (Howard)
> * fix stddev calculation (Howard)
> * add some comments about syscall_data (Howard)
>
> v2)
> * Rebased on top of Ian's e_machine changes
> * add --bpf-summary option
> * support per-thread summary
> * add stddev calculation (Howard)
>
> tools/perf/Documentation/perf-trace.txt | 6 +
> tools/perf/Makefile.perf | 2 +-
> tools/perf/builtin-trace.c | 54 ++-
> tools/perf/util/Build | 1 +
> tools/perf/util/bpf-trace-summary.c | 347 ++++++++++++++++++
> .../perf/util/bpf_skel/syscall_summary.bpf.c | 118 ++++++
> tools/perf/util/bpf_skel/syscall_summary.h | 25 ++
> tools/perf/util/trace.h | 37 ++
> 8 files changed, 577 insertions(+), 13 deletions(-)
> create mode 100644 tools/perf/util/bpf-trace-summary.c
> create mode 100644 tools/perf/util/bpf_skel/syscall_summary.bpf.c
> create mode 100644 tools/perf/util/bpf_skel/syscall_summary.h
> create mode 100644 tools/perf/util/trace.h
>
> diff --git a/tools/perf/Documentation/perf-trace.txt b/tools/perf/Documentation/perf-trace.txt
> index 887dc37773d0f4d6..a8a0d8c33438fef7 100644
> --- a/tools/perf/Documentation/perf-trace.txt
> +++ b/tools/perf/Documentation/perf-trace.txt
> @@ -251,6 +251,12 @@ the thread executes on the designated CPUs. Default is to monitor all CPUs.
> pretty-printing serves as a fallback to hand-crafted pretty printers, as the latter can
> better pretty-print integer flags and struct pointers.
>
> +--bpf-summary::
> + Collect system call statistics in BPF. This is only for live mode and
> + works well with -s/--summary option where no argument information is
> + required.

It works with -S as well, doesn't it?

Anyway, I don't mind adding these details later on, so

Reviewed-by: Howard Chu <howardchu95@xxxxxxxxx>