Re: [PATCH V2 00/10] perf script: Add API for filtering via dynamically loaded shared object

From: Adrian Hunter
Date: Mon Jun 28 2021 - 03:23:06 EST

On 27/06/21 7:13 pm, Andi Kleen wrote:
> On 6/27/2021 6:18 AM, Adrian Hunter wrote:
>> Hi In some cases, users want to filter very large amounts of data
>> (e.g. from AUX area tracing like Intel PT) looking for something
>> specific. While scripting such as Python can be used, Python is 10
>> to 20 times slower than C. So define a C API so that custom filters
>> can be written and loaded.
> While I appreciate this for complex cases, in my experience filtering
> is usually just a simple expression. It would be nice to also have a
> way to do this reasonably fast without having to write a custom C

I do not agree that writing C filters is a hassle e.g. a minimal do-nothing
filter is only a few lines:

#include <perf/perf_dlfilter.h>

int filter_event(void *data, const struct perf_dlfilter_sample *sample, void *ctx)
return 0;

(Actually, the filter program does not have to have any LOC at all, but that
is not much of an example)

Additionally, a script to do the build is fairly trivial e.g. I use this:

$ cat `which `

set -ex

if test -z "${1}" ; then
echo "Name required"
exit 1


if test "${name}" = "${1}" ; then

gcc -c -I ~/include -fpic "${name}.c"

gcc -shared -o "${name}.so" "${name}.o"

> file. Is the 10x-20x overhead just the python interpreter, or is it
> related to perf?

AFAICT the Python C API used to interface to Python performs fairly similarly
to the Python interpreter.

> Maybe we could have some kind of python fast path
> just for filters?

I expect there are ways to make it more efficient, but I doubt it would ever
come close to C.

> just for filters? Or maybe the alternative would be to have a
> frontend in perf that can automatically generate/compile such a C
> filter based on a simple expression, but I'm not sure if that would
> be much simpler.

If gcc is available, perf script could, in fact, build the .so on the fly
since the compile time is very quick.

Another point is that filters can be used for more than just filtering.
Here is an example which sums cycles per-cpu and prints them, and the difference
to the last print, at the beginning of each line. I think this was something
you were interested in doing?

#include <perf/perf_dlfilter.h>
#include <stdio.h>

#define MAX_CPU 4096

__u64 cycles[MAX_CPU];
__u64 cycles_rpt[MAX_CPU];

int filter_event_early(void *data, const struct perf_dlfilter_sample *sample, void *ctx)
__s32 cpu = sample->cpu;

if (cpu >=0 && cpu < MAX_CPU)
cycles[cpu] += sample->cyc_cnt;
return 0;

int filter_event(void *data, const struct perf_dlfilter_sample *sample, void *ctx)
__s32 cpu = sample->cpu;

if (cpu >=0 && cpu < MAX_CPU) {
printf("%10llu %10llu ", cycles[cpu], cycles[cpu] - cycles_rpt[cpu]);
cycles_rpt[cpu] = cycles[cpu];
} else {
printf("%22s", "");
return 0;

const char *filter_description(const char **long_description)
return "Print the number of cycles at the start of each line";