Re: ktap and ebpf integration

From: Alexei Starovoitov
Date: Sat Apr 05 2014 - 13:22:34 EST


On Sat, Apr 5, 2014 at 7:23 AM, Jovi Zhangwei <jovi.zhangwei@xxxxxxxxx> wrote:
> On Sat, Apr 5, 2014 at 1:28 AM, Alexei Starovoitov <ast@xxxxxxxxxxxx> wrote:
>>
>> 'ktap syntax' from user space point of view, can use ibpf as-is.
>> Show me the script and I can show how ibpf can run it.
>
> Well, please don't engage 'ktap syntax' in here, if you think
> "Integration" only means ktap compiler compiles ktap syntax
> into BPF bytecode, then that's entirely misunderstood what's
> the real problem in there, some ktap samples in below:

Great. Nice examples.
To better understand how they map to bpf architecture
consider what bpf is:
- bpf instruction set = assembler code
- one bpf program = one function
- obj_file generated by ktap or C compiler consists of multiple
bpf programs (functions) and each one attaches to one or
multiple events
- events are [ku]probe/tracepoint events including
init and fini events
- bpf program cannot have loops or call other bpf programs,
though it can call safe kernel functions like bpf_printk,
bpf_gettimeofday, bfp_getpid, etc
- one of such calls is 'bpf_load_pointer' = non-faulting access
to any memory
- another call is 'bpf_table_lookup' that does table lookup
- bpf tables are not part of execution engine.
tables are owned by kernel. User space can access them
via netlink and may be through other mechanisms (like debugfs)
Normal kernel C functions (like bpf_table_update) can access
them in parallel.
'tables' is a mechanism to pass data between bpf programs
and between bpf program and userspace

> 1). trace syscalls:* { print(argstr) }
> Register many events.
> I posted this script in previous mail, but don't get the answer
> how to support this in BPF.
> Note ktap implement this by library function(kdebug,trace_by_id),
> not change object file, can BPF does this?

yes. should be clear from above explanation.

> 2). print("hello world")
> This is simplest hello world script in ktap, note that the
> executing context is not probe context, but in main ktap
> context, BPF main context only allow declare table,
> nothing else.
> (You may think this helloworld script is not useful, but not
> true, many script don't have to run in probe context, for
> example, the script just want to read some global variable in kernel)

yes. see above.

> 3). var s = {}; trace *:* { s[probename] += 1 }
> variable table s is allocated in main context, same as above,
> BPF disallow allocate table in this flexible way, ktap allow
> assign table entries before register events, BPF also don't support.

already supported.
's' is a table where key = probe_id, value = 4-byte integer

> 4) var i = 0; trace *:* { i += 1}
> Assign global variable in here, there also can assign other
> value not 0, please show me how BPF do this.
> (See complex global usage example in samples/schedule/schedtimes.kp)

hmm. schedtimes.kp example doesn't have any global variables.
RUNNING = 0 and SLEEPING = 2 are constants.
as far as I can see even that complex example maps to bpf just fine

> 5) kdebug.kprobe("SyS_futex", function () { print(pid) })
> ktap register event through function call, not change any core vm,
> obviously BPF cannot support this flexible callback mechanism.

I'm missing a 'callback' point here.
seems you're attaching to futex and printing pid.
That's supported.

> 6). time.profile { print(stack()) }
> print kernel stack in timer manner. Note ktap implement this by library
> function, not change any bytecode object file format.

I don't understand what 'time.profile' event is.
Isn't this the same as attaching bpf program to some periodic
event and printing stack? That's supported.
Note: nothing stops the user to write bpf program that is attached
to in kernel periodic event like timer.
I just don't want a built-in mechanism for timers, since it's a can
of worms from security point of view.

> 7). trace_end
> Note there may have execute logic in trace_end part, not just only
> dump everything as you said, so I don't understand why BPF
> want to move trace_end to userspace, Dtrace/stap both support
> this, why BPF object this?
> And ktap implement trace_end by function call, not change
> any core vm design, hope BPF can do this without introduce any
> change in BPF object file format.

in case of schedule/schedtimes.kp example
trace_end event should be part of userspace, since it walks
potentially very large tables.
At the same time there is a 'fini' event that in-kernel bpf program
can attach to.
If one of the bpf programs in obj_file is attached to 'init' event
it gets called upon obj_file loading. Similar with 'fini'.

> 8) call user defined function
> It seems BPF cannot call user defined function(not inlined),
> user defined function is useful when dynamic tracing solution
> support tapset in future(IMO it's hard to avoid user defined tapset).

completely the opposite.
bpf_call instruction is the key difference between new bpf and
classic bpf.

> in summary, three key issues in BPF:
>
> 1) BPF couples table in compiler/validation program.
> Similar with table design, I think if BPF want to support aggregation
> in future, it must need to change compiler and validation, and
> will keep changes if BPF support more features.

it should be clear that tables, bpf execution engine,
kernel functions are decoupled building blocks.
verifier brings things together by allowing fixed set of kernel
functions to be called from bpf program.
Obviously we cannot allow arbitrary function call from
programs. It's not safe.

> 2) BPF don't allow execute in main context
> This is the main issue to for ktap integration, ktap allow
> assign global variable, call allowed function before register
> events to initiate things, this is mandatory for ktap, and
> IMO it is mandatory for all generic dynamic tracing tools.

not true. see all of the above.

> 3) BPF mix event register logic in object format file
> ktap object file don't aware any event logic, it's just a normal
> function all in ktap, but in BPF object file, there even have a "event"
> section.

hmm. I'm missing 'issue' here.
I think it's a feature not an issue.
bpf program is a function. Like C function it doesn't embed
in itself where it's supposed to be called.
Separate 'section' in obj_file needs to describe relation
between event and function.

> IMO, BPF engine should be a simple and generic script engine,
> just focus on the script engine, not features(table/aggregation/

we won't be able to go to far while you keep thinking
of "execution engine" as "script engine"

> All these issues make we cannot let ktap run on BPF engine because
> of current BPF limited and specific design.

imo that sounds like you just trying to find an excuse to do your own
"script engine"
You probably got an impression that I'm shutting down all of your
'bpf extension' requests. Not at all.
If things are missing, let's add them.

Thanks
Alexei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/