Re: [PATCH] perf record: Allow passing perf's own pid to '--filter'

From: Arnaldo Carvalho de Melo
Date: Mon Jul 06 2015 - 09:57:00 EST


Em Mon, Jul 06, 2015 at 04:17:31AM +0000, Wang Nan escreveu:
> This patch allows passing perf's own PID to '--filter' by using
> '@PERFPID'. This should be useful when system-widely capturing
> tracepoints events.

Steven, does filters have any special meaning for @?

> Before this patch, when doing something like:
>
> # perf record -a -e syscalls:sys_enter_write <cmd>
>
> One could easily get result like this:
>
> # /tmp/perf report --stdio
> ...
> # Overhead Command Shared Object Symbol
> # ........ ....... .................. ....................
> #
> 99.99% perf libpthread-2.18.so [.] __write_nocancel
> 0.01% ls libc-2.18.so [.] write
> 0.01% sshd libc-2.18.so [.] write
> ...
>
> Where most events are generated by perf itself.
>
> A shell trick can be done to filter perf itself out:
>
> # cat << EOF > ./tmp
> > #!/bin/sh
> > exec perf record -e ... --filter="common_pid != \$\$" -a sleep 10
> > EOF
> # chmod a+x ./tmp
> # ./tmp
>
> However, doing so is user unfriendly.
>
> This patch introduces '@PERFPID' placeholder to '--filter' options. Now
> user is allowed to the above work with:
>
> # perf record -e ... --filter="common_pid != @PERFPID' sleep 10
>
> Signed-off-by: Wang Nan <wangnan0@xxxxxxxxxx>
> ---
> tools/perf/Documentation/perf-record.txt | 1 +

<SNIP>

> +++ b/tools/perf/util/parse-events.c
> @@ -1175,6 +1175,101 @@ int parse_events_option(const struct option *opt, const char *str,
> return ret;
> }
>
> +#ifndef PAGE_SIZE
> +# define PAGE_SIZE 4096
> +#endif

You can use 'page_size', its available and filled via:

page_size = sysconf(_SC_PAGE_SIZE);

early in perf's main() routine.

> +static int
> +postproc_filter_append_token(const char *key, char *new_filter,
> + ssize_t *pspace)
> +{
> + if (strcmp(key, "PERFPID") == 0) {
> + char pid_buf[32];
> + pid_t self_pid = getpid();
> +
> + snprintf(pid_buf, 32, "%d", self_pid);

snprintf(pid_buf, sizeof(pid_buf), "%d", self_pid);

> + strncat(new_filter, pid_buf, *pspace);
> + *pspace -= strlen(pid_buf);
> + if (*pspace < 0)
> + return -1;
> + return 0;
> + }
> +
> + return -1;

but then, please take a look at my perf/core branch, by coincidence I
worked on having multiple filters set on a evsel in 'perf trace', where
at a minimum, the tools' pid is added to the filter for all tracepoints
used, i.e.:

commom_pid != getpid()

is always present, to avoid a feedback loop, neverending tracing of
syscalls generated by the tracer itself.

Then, if you use --filter-pids PID-1,PID-2,PID3, it will create an
expression with that first part, for things like gnome-terminal, xorg,
etc.

Now we need to keep that in place and if the user uses -e to specify
which syscalls it wants (or wants filtered out), we need to again
concatenate with that commom_pid list, so that we call the filter ioctl
just once, else the kernel returns EEXIST.

Because I needed to append, etc, there are new functions there for
go on creating such expressions, please use them, its all in my
perf/core branch, the latest patches.

I.e. having something in the filter expression that gets transformed
into the tools' pid, I have no problem with that, just curious about
what would be the best character to signal that a substitution needs to
be performed, if it is really '@VAR', as my first selection would be
'$VAR', but then I haven't looked deeply at ftrace's filter stuff to see
if it has provision for substitution in the kernel, etc.

Andi also did this at some point, forgot why that wasn't applied at the
time :-\

For reference:

https://git.kernel.org/cgit/linux/kernel/git/acme/linux.git/commit/?h=perf/core&id=f47805a2af3ba83881ca52434bbbc6e9886b72fd

- Arnaldo

> +}
> +
> +static void postproc_filter(struct perf_evsel *evsel)
> +{
> + char *at = NULL, *sep = NULL, *old_filter, *new_filter;
> + ssize_t space;
> +
> + if (!evsel->filter)
> + return;
> +
> + old_filter = evsel->filter;
> + at = strchr(old_filter, '@');
> + if (!at)
> + return;
> +
> + /*
> + * See perf_event_set_filter(). Length of a valid filter is
> + * limited by PAGE_SIZE.
> + */
> + new_filter = malloc(PAGE_SIZE);
> + if (!new_filter) {
> + fprintf(stderr, "No enough memory for post proc filter '%s'\n",
> + old_filter);
> + return;
> + }
> + *new_filter = '\0';
> + space = PAGE_SIZE - 1;
> +
> + while (1) {
> + if (at)
> + *at = '\0';
> + strncat(new_filter, old_filter, space);
> + space -= strlen(old_filter);
> + if (space < 0)
> + goto errout;
> + if (!at)
> + break;
> + *at = '@';
> +
> + sep = strchr(at + 1, ' ');
> + if (sep)
> + *sep = '\0';
> +
> + if (postproc_filter_append_token(at + 1, new_filter, &space))
> + goto errout;
> +
> + if (!sep)
> + break;
> + *sep = ' ';
> +
> + old_filter = sep;
> + at = strchr(old_filter, '@');
> + }
> +
> + free(evsel->filter);
> + /*
> + * It is safe to use new_filter directly. However, try to
> + * release some memory by strdup() a smaller string and free
> + * new_filter, which takes a full page.
> + */
> + evsel->filter = strdup(new_filter);
> + if (!evsel->filter)
> + evsel->filter = new_filter;
> + else
> + free(new_filter);
> + return;
> +errout:
> + if (at)
> + *at = '@';
> + if (sep)
> + *sep = ' ';
> + fprintf(stderr, "Can't post proc filter '%s'\n", evsel->filter);
> + free(new_filter);
> +}
> +
> int parse_filter(const struct option *opt, const char *str,
> int unset __maybe_unused)
> {
> @@ -1196,6 +1291,7 @@ int parse_filter(const struct option *opt, const char *str,
> return -1;
> }
>
> + postproc_filter(last);
> return 0;
> }
>
> --
> 1.8.3.4
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/