Re: Re: [RFC PATCH tip 0/5] tracing filters with BPF

From: Jovi Zhangwei
Date: Sat Dec 07 2013 - 11:21:31 EST


On Sat, Dec 7, 2013 at 7:58 AM, Masami Hiramatsu
<masami.hiramatsu.pt@xxxxxxxxxxx> wrote:
> (2013/12/06 14:19), Jovi Zhangwei wrote:
>> Hi Alexei,
>>
>> On Thu, Dec 5, 2013 at 12:40 PM, Alexei Starovoitov <ast@xxxxxxxxxxxx> wrote:
>>>> On Tue, Dec 3, 2013 at 4:01 PM, Andi Kleen <andi@xxxxxxxxxxxxxx> wrote:
>>>>>
>>>>> Can you do some performance comparison compared to e.g. ktap?
>>>>> How much faster is it?
>>>
>>> Did simple ktap test with 1M alloc_skb/kfree_skb toy test from earlier email:
>>> trace skb:kfree_skb {
>>> if (arg2 == 0x100) {
>>> printf("%x %x\n", arg1, arg2)
>>> }
>>> }
>>> 1M skb alloc/free 350315 (usecs)
>>>
>>> baseline without any tracing:
>>> 1M skb alloc/free 145400 (usecs)
>>>
>>> then equivalent bpf test:
>>> void filter(struct bpf_context *ctx)
>>> {
>>> void *loc = (void *)ctx->regs.dx;
>>> if (loc == 0x100) {
>>> struct sk_buff *skb = (struct sk_buff *)ctx->regs.si;
>>> char fmt[] = "skb %p loc %p\n";
>>> bpf_trace_printk(fmt, sizeof(fmt), (long)skb, (long)loc, 0);
>>> }
>>> }
>>> 1M skb alloc/free 183214 (usecs)
>>>
>>> so with one 'if' condition the difference ktap vs bpf is 350-145 vs 183-145
>>>
>>> obviously ktap is an interpreter, so it's not really fair.
>>>
>>> To make it really unfair I did:
>>> trace skb:kfree_skb {
>>> if (arg2 == 0x100 || arg2 == 0x200 || arg2 == 0x300 || arg2 == 0x400 ||
>>> arg2 == 0x500 || arg2 == 0x600 || arg2 == 0x700 || arg2 == 0x800 ||
>>> arg2 == 0x900 || arg2 == 0x1000) {
>>> printf("%x %x\n", arg1, arg2)
>>> }
>>> }
>>> 1M skb alloc/free 484280 (usecs)
>>>
>>> and corresponding bpf:
>>> void filter(struct bpf_context *ctx)
>>> {
>>> void *loc = (void *)ctx->regs.dx;
>>> if (loc == 0x100 || loc == 0x200 || loc == 0x300 || loc == 0x400 ||
>>> loc == 0x500 || loc == 0x600 || loc == 0x700 || loc == 0x800 ||
>>> loc == 0x900 || loc == 0x1000) {
>>> struct sk_buff *skb = (struct sk_buff *)ctx->regs.si;
>>> char fmt[] = "skb %p loc %p\n";
>>> bpf_trace_printk(fmt, sizeof(fmt), (long)skb, (long)loc, 0);
>>> }
>>> }
>>> 1M skb alloc/free 185660 (usecs)
>>>
>>> the difference is bigger now: 484-145 vs 185-145
>>>
>> There have big differences for compare arg2(in ktap) with direct register
>> access(ctx->regs.dx).
>>
>> The current argument fetching(arg2 in above testcase) implementation in ktap
>> is very inefficient, see ktap/interpreter/lib_kdebug.c:kp_event_getarg.
>> The only way to speedup is kernel tracing code change, let external tracing
>> module access event field not through list lookup. This work is not
>> started yet. :)
>
> I'm not sure why you can't access it directly from ftrace-event buffer.
> There is just a packed data structure and it is exposed via debugfs.
> You can decode it and can get an offset/size by using libtraceevent.
>
Then it means there need pass the event field info into kernel through trunk,
it looks strange because the kernel structure is the source of event field info,
it's like loop-back, and need to engage with libtraceevent in userspace.
(the side effect is it will make compilation slow, and consume more memory,
sometimes it will process 20K events in one script, like 'trace
probe:big_dso:*')

So "the only way" which I said is wrong, your approach indeed is another way.
I just think maybe use array instead of list for event fields would be more
efficient if list is not must needed. we can check it more in future.

Thanks.

Jovi.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/