Re: [PATCH v4 00/11] perf sched: Introduce stats tool

From: Ravi Bangoria

Date: Thu Dec 11 2025 - 22:43:52 EST


Hi Ian,

>>> Next is CPU scheduling statistics. These are simple diffs of
>>> /proc/schedstat CPU lines along with description. The report also
>>> prints % relative to base stat.
>
> I wonder if this is similar to user_time and system_time:
> ```
> $ perf list
> ...
> tool:
> ...
> system_time
> [System/kernel time in nanoseconds. Unit: tool]
> ...
> user_time
> [User (non-kernel) time in nanoseconds. Unit: tool]
> ...
> ```
> These events are implemented by reading /proc/stat and /proc/pid/stat:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/tool_pmu.c?h=perf-tools-next#n267
>
> As they are events then they can appear in perf stat output and also
> within metrics.

Create synthesized events for each field of /proc/schedstat?

Your idea is interesting and, I suppose, will work best when we care
about individual counters. However, for the "perf sched stats" tool,
I see atleast two challenges:

1. One of the design goal of "perf sched stats" was to keep the
overhead low. Currently, it reads /proc/schedstat once at the
beginning and once at the end. Switching to per-counter events
would require opening, reading and closing a large number of
events which would incur significant overhead.

2. Taking a snapshot in one go allows us to correlate counts easily.
Using synthetic events would force us to read each counter
individually, making cross-counter correlation impossible.

Thanks,
Ravi