Re: [PATCH RFC 0/3] perf: add logic to collect off-cpu samples

From: Ajay Kaher
Date: Sat Jul 13 2024 - 03:43:15 EST


On Fri, Jul 12, 2024 at 3:28 AM Ian Rogers <irogers@xxxxxxxxxx> wrote:
>
> On Thu, Jul 11, 2024 at 5:16 AM Ajay Kaher <ajay.kaher@xxxxxxxxxxxx> wrote:
> >
> > Add --off-cpu-kernel option to capture off-cpu sample alongwith on-cpu
> > samples.
> >
> > off-cpu samples represent time spent by task when it was on wait queue
> > (schedule out to waiting for events, blocked on I/O, locks, timers,
> > paging/swapping, etc)
> >
> > Refer following links for more details:
> > https://lpc.events/event/17/contributions/1556/
> > https://www.youtube.com/watch?v=sF2faKGRnjs
>
> Hi Ajay,
>
> I wonder if Howard's improvements (not landed) for `perf record
> --off-cpu` would solve this problem for you?
> https://lore.kernel.org/lkml/20240424024805.144759-1-howardchu95@xxxxxxxxx/
> Or is that approach problematic due to the use of BPF?
>

Thanks Ian for your response and sharing Howard's improvements.

Yes, perf --off-cpu is based upon BPF and having following restrictions:

- target binary should be compiled with frame pointer, same mentioned
in tools/perf/Documentation/perf-record.txt:
Note that BPF can collect stack traces using frame pointer ("fp") only,
as of now. So the applications built without the frame pointer might see
bogus addresses.

- perf should be complied with BUILD_BPF_SKEL=1:
Warning: option `off-cpu' is being ignored because no BUILD_BPF_SKEL=1

- off-cpu, on-cpu samples are not on the same result page.
(I guess Howard has improve this, not tried his patches)

I have tried to collect the off-cpu sample same as on-cpu sample with the help
of kernel/events/core.c. We will get one off-cpu sample from the target task
sched-out to sched-in. Or we can say off-cpu samples are not dependent on
frequency provided by the user to perf record.

I am also worried about having so many samples if sched-in/out
frequency is high.
Thinking to merge samples if attributes are the same (i.e. stacktrace)
and add the
off-cpu period to previous samples with the same attribute.

-Ajay

> Thanks,
> Ian
>
> > Ajay Kaher (3):
> > perf/core: add logic to collect off-cpu sample
> > perf/record: add options --off-cpu-kernel
> > perf/report: add off-cpu samples
> >
> > include/linux/perf_event.h | 16 ++++++++++++++
> > include/uapi/linux/perf_event.h | 3 ++-
> > kernel/events/core.c | 27 ++++++++++++++++++-----
> > tools/include/uapi/linux/perf_event.h | 3 ++-
> > tools/perf/builtin-record.c | 2 ++
> > tools/perf/util/events_stats.h | 2 ++
> > tools/perf/util/evsel.c | 4 ++++
> > tools/perf/util/hist.c | 31 ++++++++++++++++++++++++---
> > tools/perf/util/hist.h | 1 +
> > tools/perf/util/record.h | 1 +
> > tools/perf/util/sample.h | 1 +
> > 11 files changed, 81 insertions(+), 10 deletions(-)
> >
> > --
> > 2.39.0
> >