Re: [RFC PATCH 1/5] bpf: Put perf_events check ahead of bpf prog

From: Wangnan (F)
Date: Thu Jul 02 2015 - 01:54:55 EST




On 2015/7/2 11:50, Alexei Starovoitov wrote:
On 6/30/15 7:57 PM, He Kuang wrote:
When we add a kprobe point and record events by perf, the execution path
of all threads on each cpu will enter this point, but perf may only
record events on a particular thread or cpu at this kprobe point, a
check on call->perf_events list filters out the threads which perf is
not recording.

I think there is a better way to do that. You're adding artificial
per_cpu filtering whereas you really need per_pid filtering.


I think the differences between you and He Kuang is the order of
filtering. In He Kuang's view, perf's original filtering mechanism
(implicit or explicit) should takes precedence over BPF filter, because
what the user want is to filter events with *an additional* BPF filter.
So filters should be run by following order:

event -> X -> Y -> Z -> BPF filter +-> perf.data
|
`-> dropped

(In the above diagram, X represents limitations which prevent an event
to be triggered. For example, kprobe reentering. Y represents implicit
filters, like checking of call->perf_events, which is used to filter
events from other CPU out (per-pid perf event is also done by it).
Z represents explicit filter which is set using
PERF_EVENT_IOC_SET_FILTER by user.)

So only those events which should be collected by perf without BPF
filter should be passed to BPF program.

The above is our understanding of ideal BPF filters.

Therefore, to create a ideal BPF filter, it should be better to put BPF
filters into perf_tp_filter_match().

In current implementation, BPF filters take effects in the middle
of kprobe event processing:

event -> X -> BPF filter -> Y -> Z +-> perf.data
|
`-> dropped

And this patch changes the ordering to:

event -> X -> Y -> BPF filter -> Z +-> perf.data
|
`-> dropped

Both are not ideal, but He Kuang's patch moves BPF filter to correct
direction. It uses a relativly lower-cost operation (checking of
call->perf_events) to reduce the need of calling BPF filters.

I'd like to discuss with you about the correctness of our
understanding. Do you have any strong reason to put BPF filters at such
an early stage?

Thank you.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/