Re: [PATCH RFC 0/3] perf: add logic to collect off-cpu samples
From: Ian Rogers
Date: Sun Jul 14 2024 - 14:33:00 EST
On Sat, Jul 13, 2024 at 12:43 AM Ajay Kaher <ajay.kaher@xxxxxxxxxxxx> wrote:
>
> On Fri, Jul 12, 2024 at 3:28 AM Ian Rogers <irogers@xxxxxxxxxx> wrote:
> >
> > On Thu, Jul 11, 2024 at 5:16 AM Ajay Kaher <ajay.kaher@xxxxxxxxxxxx> wrote:
> > >
> > > Add --off-cpu-kernel option to capture off-cpu sample alongwith on-cpu
> > > samples.
> > >
> > > off-cpu samples represent time spent by task when it was on wait queue
> > > (schedule out to waiting for events, blocked on I/O, locks, timers,
> > > paging/swapping, etc)
> > >
> > > Refer following links for more details:
> > > https://lpc.events/event/17/contributions/1556/
> > > https://www.youtube.com/watch?v=sF2faKGRnjs
> >
> > Hi Ajay,
> >
> > I wonder if Howard's improvements (not landed) for `perf record
> > --off-cpu` would solve this problem for you?
> > https://lore.kernel.org/lkml/20240424024805.144759-1-howardchu95@xxxxxxxxx/
> > Or is that approach problematic due to the use of BPF?
> >
>
> Thanks Ian for your response and sharing Howard's improvements.
>
> Yes, perf --off-cpu is based upon BPF and having following restrictions:
>
> - target binary should be compiled with frame pointer, same mentioned
> in tools/perf/Documentation/perf-record.txt:
> Note that BPF can collect stack traces using frame pointer ("fp") only,
> as of now. So the applications built without the frame pointer might see
> bogus addresses.
Agreed, if you want more than the leaf function. There's some evidence
that frame pointers may become the default:
https://www.brendangregg.com/blog/2024-03-17/the-return-of-the-frame-pointers.html
> - perf should be complied with BUILD_BPF_SKEL=1:
> Warning: option `off-cpu' is being ignored because no BUILD_BPF_SKEL=1
We've made this the default build behavior but probably a bigger issue
is that generally to have permissions you need to run as root.
> - off-cpu, on-cpu samples are not on the same result page.
> (I guess Howard has improve this, not tried his patches)
>
> I have tried to collect the off-cpu sample same as on-cpu sample with the help
> of kernel/events/core.c. We will get one off-cpu sample from the target task
> sched-out to sched-in. Or we can say off-cpu samples are not dependent on
> frequency provided by the user to perf record.
>
> I am also worried about having so many samples if sched-in/out
> frequency is high.
> Thinking to merge samples if attributes are the same (i.e. stacktrace)
> and add the
> off-cpu period to previous samples with the same attribute.
Right, this would be useful to look at in the context of Howard's
patches. The proposal in his changes are that short off-cpu times are
aggregated in a BPF map and dumped at the end of perf record - this
matches the existing perf record off-CPU behavior. Longer off-CPU
times, where long is open for debate and will be a parameter (perhaps
100 microseconds) and cause a sample to be created. BPF programs
create BPF output events, Howard's last patch series would rewrite the
events in perf record to be off-CPU samples. At the last office hours
it was discussed that we should dump the BPF output events directly,
to keep perf record overhead minimal, and add the ability to rewrite
into samples in tools like perf report, etc.
Thanks,
Ian
> -Ajay
>
> > Thanks,
> > Ian
> >
> > > Ajay Kaher (3):
> > > perf/core: add logic to collect off-cpu sample
> > > perf/record: add options --off-cpu-kernel
> > > perf/report: add off-cpu samples
> > >
> > > include/linux/perf_event.h | 16 ++++++++++++++
> > > include/uapi/linux/perf_event.h | 3 ++-
> > > kernel/events/core.c | 27 ++++++++++++++++++-----
> > > tools/include/uapi/linux/perf_event.h | 3 ++-
> > > tools/perf/builtin-record.c | 2 ++
> > > tools/perf/util/events_stats.h | 2 ++
> > > tools/perf/util/evsel.c | 4 ++++
> > > tools/perf/util/hist.c | 31 ++++++++++++++++++++++++---
> > > tools/perf/util/hist.h | 1 +
> > > tools/perf/util/record.h | 1 +
> > > tools/perf/util/sample.h | 1 +
> > > 11 files changed, 81 insertions(+), 10 deletions(-)
> > >
> > > --
> > > 2.39.0
> > >