Re: [PATCH v5 0/8] perf report: Add latency and parallelism profiling
From: Dmitry Vyukov
Date: Thu Feb 06 2025 - 13:42:01 EST
On Thu, 6 Feb 2025 at 19:30, Andi Kleen <ak@xxxxxxxxxxxxxxx> wrote:
>
> Dmitry Vyukov <dvyukov@xxxxxxxxxx> writes:
>
> > There are two notions of time: wall-clock time and CPU time.
> > For a single-threaded program, or a program running on a single-core
> > machine, these notions are the same. However, for a multi-threaded/
> > multi-process program running on a multi-core machine, these notions are
> > significantly different. Each second of wall-clock time we have
> > number-of-cores seconds of CPU time.
>
> I'm curious how does this interact with the time / --time-quantum sort key?
>
> I assume it just works, but might be worth checking.
I will check later. But if you have some concrete commands to try, it
will help. I never used --time-quantum before.
> It was intended to address some of these issues too.
>
> > Optimizing CPU overhead is useful to improve 'throughput', while
> > optimizing wall-clock overhead is useful to improve 'latency'.
> > These profiles are complementary and are not interchangeable.
> > Examples of where latency profile is needed:
> > - optimzing build latency
> > - optimizing server request latency
> > - optimizing ML training/inference latency
> > - optimizing running time of any command line program
> >
> > CPU profile is useless for these use cases at best (if a user understands
> > the difference), or misleading at worst (if a user tries to use a wrong
> > profile for a job).
>
> I would agree in the general case, but not if the time sort key
> is chosen with a suitable quantum. You can see how the parallelism
> changes over time then which is often a good enough proxy.
Never used it. I will look at what capabilities it provides.
> > We still default to the CPU profile, so it's up to users to learn
> > about the second profiling mode and use it when appropriate.
>
> You should add it to tips.txt then
It is done in the docs patch.
> > .../callchain-overhead-calculation.txt | 5 +-
> > .../cpu-and-latency-overheads.txt | 85 ++++++++++++++
> > tools/perf/Documentation/perf-record.txt | 4 +
> > tools/perf/Documentation/perf-report.txt | 54 ++++++---
> > tools/perf/Documentation/tips.txt | 3 +
> > tools/perf/builtin-record.c | 20 ++++
> > tools/perf/builtin-report.c | 39 +++++++
> > tools/perf/ui/browsers/hists.c | 27 +++--
> > tools/perf/ui/hist.c | 104 ++++++++++++------
> > tools/perf/util/addr_location.c | 1 +
> > tools/perf/util/addr_location.h | 7 +-
> > tools/perf/util/event.c | 11 ++
> > tools/perf/util/events_stats.h | 2 +
> > tools/perf/util/hist.c | 90 ++++++++++++---
> > tools/perf/util/hist.h | 32 +++++-
> > tools/perf/util/machine.c | 7 ++
> > tools/perf/util/machine.h | 6 +
> > tools/perf/util/sample.h | 2 +-
> > tools/perf/util/session.c | 12 ++
> > tools/perf/util/session.h | 1 +
> > tools/perf/util/sort.c | 69 +++++++++++-
> > tools/perf/util/sort.h | 3 +-
> > tools/perf/util/symbol.c | 34 ++++++
> > tools/perf/util/symbol_conf.h | 8 +-
>
> We traditionally didn't do it, but in general test coverage
> of perf report is too low, so I would recommend to add some simple
> test case in the perf test scripts.
What of this is testable within the current testing framework?
Also how do I run tests? I failed to figure it out.