Hi Swapnil,
On 3/11/2025 8:02 PM, Swapnil Sapkal wrote:
MOTIVATION
----------
Existing `perf sched` is quite exhaustive and provides lot of insights
into scheduler behavior but it quickly becomes impractical to use for
long running or scheduler intensive workload. For ex, `perf sched record`
has ~7.77% overhead on hackbench (with 25 groups each running 700K loops
on a 2-socket 128 Cores 256 Threads 3rd Generation EPYC Server), and it
generates huge 56G perf.data for which perf takes ~137 mins to prepare
and write it to disk [1].
Unlike `perf sched record`, which hooks onto set of scheduler tracepoints
and generates samples on a tracepoint hit, `perf sched stats record` takes
snapshot of the /proc/schedstat file before and after the workload, i.e.
there is almost zero interference on workload run. Also, it takes very
minimal time to parse /proc/schedstat, convert it into perf samples and
save those samples into perf.data file. Result perf.data file is much
smaller. So, overall `perf sched stats record` is much more light weight
compare to `perf sched record`.
We, internally at AMD, have been using this (a variant of this, known as
"sched-scoreboard"[2]) and found it to be very useful to analyse impact
of any scheduler code changes[3][4]. Prateek used v2[5] of this patch
series to report the analysis[6][7].
Please note that, this is not a replacement of perf sched record/report.
The intended users of the new tool are scheduler developers, not regular
users.
USAGE
-----
# perf sched stats record
# perf sched stats report
# perf sched stats diff
May I know the status of this patch set? I tested it on a 96 cores system and it works as expected in general.
One nit question:
Is perf.data and perf.data.old the default files
for comparison if no files are provided in
perf sched stats diff?
--
thanks,
Chenyu