Re: [RFC PATCH tip 4/5] use BPF in tracing filters
From: Masami Hiramatsu
Date: Wed Dec 04 2013 - 19:05:55 EST
(2013/12/04 10:11), Steven Rostedt wrote:
> On Wed, 04 Dec 2013 09:48:44 +0900
> Masami Hiramatsu <masami.hiramatsu.pt@xxxxxxxxxxx> wrote:
>
>> (2013/12/03 13:28), Alexei Starovoitov wrote:
>>> Such filters can be written in C and allow safe read-only access to
>>> any kernel data structure.
>>> Like systemtap but with safety guaranteed by kernel.
>>>
>>> The user can do:
>>> cat bpf_program > /sys/kernel/debug/tracing/.../filter
>>> if tracing event is either static or dynamic via kprobe_events.
>>>
>>> The program can be anything as long as bpf_check() can verify its safety.
>>> For example, the user can create kprobe_event on dst_discard()
>>> and use logically following code inside BPF filter:
>>> skb = (struct sk_buff *)ctx->regs.di;
>>> dev = bpf_load_pointer(&skb->dev);
>>> to access 'struct net_device'
>>> Since its prototype is 'int dst_discard(struct sk_buff *skb);'
>>> 'skb' pointer is in 'rdi' register on x86_64
>>> bpf_load_pointer() will try to fetch 'dev' field of 'sk_buff'
>>> structure and will suppress page-fault if pointer is incorrect.
>>
>> Hmm, I doubt it is a good way to integrate with ftrace.
>> I prefer to use this for replacing current ftrace filter,
>
> I'm not sure how we can do that. Especially since the bpf is very arch
> specific, and the current filters work for all archs.
My idea is to use BPF for the arch specific optimization for
ftrace filter. For the other arch, filter works with current
code. So the ftrace holds filter_preds and compile it in
BPF bytecode if possible.
And this backend optimization also can be done for fetch methods.
>> fetch functions and actions. In that case, we can continue
>> to use current interface but much faster to trace.
>> Also, we can see what filter/arguments/actions are set
>> on each event.
>
> There's also the problem that the current filters work with the results
> of what is written to the buffer, not what is passed in by the trace
> point, as that isn't even displayed to the user.
Agreed, so I've said I doubt this implementation is a good
shape to integrate. Ktap style is better, since it just gets
parameters from perf buffer entry (using event format).
Thank you,
--
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@xxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/