Re: ktap and ebpf integration

From: Jovi Zhangwei
Date: Sat Apr 05 2014 - 17:27:25 EST


On Sun, Apr 6, 2014 at 1:22 AM, Alexei Starovoitov <ast@xxxxxxxxxxxx> wrote:
> On Sat, Apr 5, 2014 at 7:23 AM, Jovi Zhangwei <jovi.zhangwei@xxxxxxxxx> wrote:
>> On Sat, Apr 5, 2014 at 1:28 AM, Alexei Starovoitov <ast@xxxxxxxxxxxx> wrote:
>>>
>>> 'ktap syntax' from user space point of view, can use ibpf as-is.
>>> Show me the script and I can show how ibpf can run it.
>>
>> Well, please don't engage 'ktap syntax' in here, if you think
>> "Integration" only means ktap compiler compiles ktap syntax
>> into BPF bytecode, then that's entirely misunderstood what's
>> the real problem in there, some ktap samples in below:
>
> Great. Nice examples.
> To better understand how they map to bpf architecture
> consider what bpf is:
> - bpf instruction set = assembler code
> - one bpf program = one function
> - obj_file generated by ktap or C compiler consists of multiple
> bpf programs (functions) and each one attaches to one or
> multiple events
> - events are [ku]probe/tracepoint events including
> init and fini events
> - bpf program cannot have loops or call other bpf programs,
> though it can call safe kernel functions like bpf_printk,
> bpf_gettimeofday, bfp_getpid, etc
> - one of such calls is 'bpf_load_pointer' = non-faulting access
> to any memory
> - another call is 'bpf_table_lookup' that does table lookup
> - bpf tables are not part of execution engine.
> tables are owned by kernel. User space can access them
> via netlink and may be through other mechanisms (like debugfs)
> Normal kernel C functions (like bpf_table_update) can access
> them in parallel.
> 'tables' is a mechanism to pass data between bpf programs
> and between bpf program and userspace
>

It seems you use confused statement about
"it's already supported" vs "it will be support" in below.

>> 1). trace syscalls:* { print(argstr) }
>> Register many events.
>> I posted this script in previous mail, but don't get the answer
>> how to support this in BPF.
>> Note ktap implement this by library function(kdebug,trace_by_id),
>> not change object file, can BPF does this?
>
> yes. should be clear from above explanation.
>
You still don't give me clear answer how it register multi-events
in one script, you use "attach" term for event register, so I guess
it means use "cat *.bpf > /sys/kernel/***/event/filter" to attach, right?
In my thinking, the event registration should be self-described in
script, why need another command line for event registration?
is that means user need to "cat" many times to register multi-events?

>> 2). print("hello world")
>> This is simplest hello world script in ktap, note that the
>> executing context is not probe context, but in main ktap
>> context, BPF main context only allow declare table,
>> nothing else.
>> (You may think this helloworld script is not useful, but not
>> true, many script don't have to run in probe context, for
>> example, the script just want to read some global variable in kernel)
>
> yes. see above.
>
Already supported? or will supported?
I didn't found a way to support this based on your patchset.

>> 3). var s = {}; trace *:* { s[probename] += 1 }
>> variable table s is allocated in main context, same as above,
>> BPF disallow allocate table in this flexible way, ktap allow
>> assign table entries before register events, BPF also don't support.
>
> already supported.
> 's' is a table where key = probe_id, value = 4-byte integer
>
>> 4) var i = 0; trace *:* { i += 1}
>> Assign global variable in here, there also can assign other
>> value not 0, please show me how BPF do this.
>> (See complex global usage example in samples/schedule/schedtimes.kp)
>
> hmm. schedtimes.kp example doesn't have any global variables.
> RUNNING = 0 and SLEEPING = 2 are constants.
> as far as I can see even that complex example maps to bpf just fine
>
Firstly, I want to say access global variable is not supported in your
patchset, compiler report it clearly, so if you think it already supported,
then it should be "will supported".

Anyway, I'm glad to see BPF are going to support this.

If I guess right, you plan to init global variables in 'init' section, it's fine
for me, again, it's "will support", not "supported", and this is first
time to know 'init' and 'fini' section in this mail, not mentioned before,
it's good to see BPF make progress.

>> 5) kdebug.kprobe("SyS_futex", function () { print(pid) })
>> ktap register event through function call, not change any core vm,
>> obviously BPF cannot support this flexible callback mechanism.
>
> I'm missing a 'callback' point here.
> seems you're attaching to futex and printing pid.
> That's supported.
>
The key is ktap implement this without change object file format, but
BPF need, anyway, I don't think hardcode in section is a big program,
but it should be self-described in script, not need another cat command line.

>> 6). time.profile { print(stack()) }
>> print kernel stack in timer manner. Note ktap implement this by library
>> function, not change any bytecode object file format.
>
> I don't understand what 'time.profile' event is.
> Isn't this the same as attaching bpf program to some periodic
> event and printing stack? That's supported.
> Note: nothing stops the user to write bpf program that is attached
> to in kernel periodic event like timer.
> I just don't want a built-in mechanism for timers, since it's a can
> of worms from security point of view.
>
Sorry, the syntax is: profile-10us {...}
It means timer fired on each cpu, maybe timer is NMI, it's needed to
get real kernel stack. similarly, tick-10us means timer only fired on
one cpu.(stap/dtrace both support this kind of timer mode)

It not means attach to in kernel periodic event, user need right to
set they specific timer interval.

Actually I don't know why you object this timer event? you mean
it as security issue, but how perf? perf also allow use to set
timer frequency, perf also have security issue?

>> 7). trace_end
>> Note there may have execute logic in trace_end part, not just only
>> dump everything as you said, so I don't understand why BPF
>> want to move trace_end to userspace, Dtrace/stap both support
>> this, why BPF object this?
>> And ktap implement trace_end by function call, not change
>> any core vm design, hope BPF can do this without introduce any
>> change in BPF object file format.
>
> in case of schedule/schedtimes.kp example
> trace_end event should be part of userspace, since it walks
> potentially very large tables.
> At the same time there is a 'fini' event that in-kernel bpf program
> can attach to.
> If one of the bpf programs in obj_file is attached to 'init' event
> it gets called upon obj_file loading. Similar with 'fini'.
>
Again, first time to know BPF will support "init" and "fini" function,
good move.

>> 8) call user defined function
>> It seems BPF cannot call user defined function(not inlined),
>> user defined function is useful when dynamic tracing solution
>> support tapset in future(IMO it's hard to avoid user defined tapset).
>
> completely the opposite.
> bpf_call instruction is the key difference between new bpf and
> classic bpf.
>
Perhaps you misunderstood, I mean call function in script,
not pre-defineded in kernel.

That's need for tapset, which I think it cannot be avoid in these
dynamic tracing tool(ktap/stap/dtrace), but I think it's not a
big issue in first step, just mind it maybe need to call
another function in script.

>> in summary, three key issues in BPF:
>>
>> 1) BPF couples table in compiler/validation program.
>> Similar with table design, I think if BPF want to support aggregation
>> in future, it must need to change compiler and validation, and
>> will keep changes if BPF support more features.
>
> it should be clear that tables, bpf execution engine,
> kernel functions are decoupled building blocks.

I don't think it's proper to use "decoupled" when there have a
"table" section in object format, it should not be in there if it's
truly decoupled.

Actually I think we can find a way to remove table section out
of object file without hurt safety(note BPF may need to support
aggregation someday, which is another kind of table, I don't think
it's a good idea to add more sections), but let's finish current issues,
then we can go through table design in the last.

> verifier brings things together by allowing fixed set of kernel
> functions to be called from bpf program.
> Obviously we cannot allow arbitrary function call from
> programs. It's not safe.
>
>> 2) BPF don't allow execute in main context
>> This is the main issue to for ktap integration, ktap allow
>> assign global variable, call allowed function before register
>> events to initiate things, this is mandatory for ktap, and
>> IMO it is mandatory for all generic dynamic tracing tools.
>
> not true. see all of the above.
>
Again, you just raised solution in this mail(init and fini section), not before.

>> 3) BPF mix event register logic in object format file
>> ktap object file don't aware any event logic, it's just a normal
>> function all in ktap, but in BPF object file, there even have a "event"
>> section.
>
> hmm. I'm missing 'issue' here.
> I think it's a feature not an issue.
> bpf program is a function. Like C function it doesn't embed
> in itself where it's supposed to be called.
> Separate 'section' in obj_file needs to describe relation
> between event and function.
>
Said above, if the event section can be self-described, then that's fine,
even though ktap do this more cleaner without touch object file format.

IMO we need to use some form string(syscalls:*) to represent event
registration or event id(like perf/ktap does), both is fine for me.

Anyway, I'm glad to see we already have some agreement on what
BPF need to extend.

Thanks.

Jovi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/