Re: [PATCH v1 0/3] Use BPF filters for a "perf top -u" workaround

From: Namhyung Kim
Date: Fri May 17 2024 - 21:22:17 EST


Hi Ian,

On Thu, May 16, 2024 at 10:34 AM Ian Rogers <irogers@xxxxxxxxxx> wrote:
>
> On Wed, May 15, 2024 at 10:04 PM Ian Rogers <irogers@xxxxxxxxxx> wrote:
> >
> > On Wed, May 15, 2024 at 9:20 PM Ian Rogers <irogers@xxxxxxxxxx> wrote:
> > >
> > > Allow uid and gid to be terms in BPF filters by first breaking the
> > > connection between filter terms and PERF_SAMPLE_xx values. Calculate
> > > the uid and gid using the bpf_get_current_uid_gid helper, rather than
> > > from a value in the sample. Allow filters to be passed to perf top, this allows:
> > >
> > > $ perf top -e cycles:P --filter "uid == $(id -u)"
> > >
> > > to work as a "perf top -u" workaround, as "perf top -u" usually fails
> > > due to processes/threads terminating between the /proc scan and the
> > > perf_event_open.
> >
> > Fwiw, something I noticed playing around with this (my workload was
> > `perf test -w noploop 100000` as different users) is that old samples
> > appeared to linger around making terminated processes still appear in
> > the top list. My guess is that there aren't other samples showing up
> > and pushing the old sample events out of the ring buffers due to the
> > filter. This can look quite odd and I don't know if we have a way to
> > improve upon it, flush the ring buffers, histograms, etc. It appears
> > to be a latent `perf top` issue that you could encounter on other low
> > frequency events, but I thought I'd mention it anyway.
>
> Some other thoughts:
>
> - It is kind of annoying with the --filter option (either on top or
> record) that there first needs to be an event to filter on. It'd be
> nice if we could just filter the default event.

Hmm.. right. It should work with the default event when
no -e option is given.

>
> - Should "perf top --uid=1234" be removed or turned into an alias
> for '--filter "uid == $(id -u)"' given the --uid option generally
> doesn't work?

I think --uid should not fail if it cannot find the task.
I had a similar situation for perf stat --for-each-cgroup
and made it ignore the failures.

>
> - What should happen to the perf top --pid and --tid options, should
> they be filters? Should they fallback on /proc scanning if there
> aren't sufficient BPF permissions? The plumbing for that is going to
> be messy.

I'm not inclined to do such things.

>
> - There should probably be a way to filter on cgroups.

+1

>
> - Does the user care that there are 3 kinds of filter that will work
> differently? Could we break them apart to make it more explicit, I may
> want tracepoint events with a BPF filter. How can we ensure 1 syntax
> for the 3 kinds of filter.
>
> - Filtering on register values could be potentially interesting, for
> example, sampling on memcpy-s where the length is over a threshold. We
> have a register capture test:
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/tests/shell/record.sh#n81
> Perhaps the filter could look something like 'perf record -g -e
> mem:$ADDRESS_OF_MEMCPY:x --filter "reg:rdx > 1024"' - this makes me
> think we need to make a more convenient way to specify memory
> addresses as symbols.

I've been thinking about a similar idea on uftrace.
It would filter the function based on the value of an
argument or a global variable.

Thanks,
Namhyung


> >
> > > Ian Rogers (3):
> > > perf bpf filter: Give terms their own enum
> > > perf bpf filter: Add uid and gid terms
> > > perf top: Allow filters on events
> > >
> > > tools/perf/Documentation/perf-record.txt | 2 +-
> > > tools/perf/Documentation/perf-top.txt | 4 ++
> > > tools/perf/builtin-top.c | 9 +++
> > > tools/perf/util/bpf-filter.c | 55 ++++++++++++----
> > > tools/perf/util/bpf-filter.h | 5 +-
> > > tools/perf/util/bpf-filter.l | 66 +++++++++----------
> > > tools/perf/util/bpf-filter.y | 7 +-
> > > tools/perf/util/bpf_skel/sample-filter.h | 27 +++++++-
> > > tools/perf/util/bpf_skel/sample_filter.bpf.c | 67 +++++++++++++++-----
> > > 9 files changed, 172 insertions(+), 70 deletions(-)
> > >
> > > --
> > > 2.45.0.rc1.225.g2a3ae87e7f-goog
> > >