Re: [RFC PATCH 00/28] ktap: A lightweight dynamic tracing tool for Linux

From: Jovi Zhangwei
Date: Tue Apr 01 2014 - 00:47:30 EST


Hi Alexei,

On Tue, Apr 1, 2014 at 5:29 AM, Alexei Starovoitov <ast@xxxxxxxxxxxx> wrote:
> On Mon, Mar 31, 2014 at 3:01 AM, Jovi Zhangwei <jovi.zhangwei@xxxxxxxxx> wrote:
>> Hi Ingo,
>>
>> On Mon, Mar 31, 2014 at 3:17 PM, Ingo Molnar <mingo@xxxxxxxxxx> wrote:
>>>
>>> * Jovi Zhangwei <jovi.zhangwei@xxxxxxxxx> wrote:
>>>
>>>> Hi All,
>>>>
>>>> The following set of patches add ktap tracing tool.
>>>>
>>>> ktap is a new script-based dynamic tracing tool for Linux.
>>>> It uses a scripting language and lets the user trace system dynamically.
>>>>
>>>> Highlights features:
>>>> * a simple but powerful scripting language
>>>> * register-based interpreter (heavily optimized) in Linux kernel
>>>> * small and lightweight
>>>> * not depend on the GCC toolchain for each script run
>>>> * easy to use in embedded environments without debugging info
>>>> * support for tracepoint, kprobe, uprobe, function trace, timer, and more
>>>> * supported in x86, ARM, PowerPC, MIPS
>>>> * safety in sandbox
>>>
>>> I've asked this fundamental design question before but got no full
>>> answer: how does ktap compare to the ongoing effort of improving the
>>> BPF scripting engine?
>>>
>>
>> From long experiences of ktap development, what make me really
>> love ktap is:
>>
>> 1) Availability
>> ktap is only available tool to use in small embedded platform, stap
>> and BPF both need GCC now, stap have its own language, so it's much
>> better than BPF.
>> (IMO there may need several years to complete a skeleton of dynamic
>> tracing script language, see stap and dtrace)
>>
>> 2) Simplicity
>> ktap is simplest dynamic scripting trace solution now in Linux world,
>> compare with stap/dtrace/BPF.
>> a). It have simple syntax which make many people like it, it have
>> b). It have simple associate array, make dynamic tracing powerful.
>> c). It have a simple compiler which only have 87K in x86_64.
>> d). It have a simple tracing syntax which constant with perf events.
>>
>> 3) Safety
>> ktap already delivered its safety to end user, many people use ktap
>> in their dev lab to investigate problem.
>> But BPF need time to prove its safety, especially proved by end user,
>> and IMO BPF safety check would be more complex if the runtime
>> support more features as time goes.
>
> safety of ktap is arguable.
>
> 1.
> From the diff it seems that 'loop_count' is a dynamic way of
> checking that loops are not infinite, but max_loop_count = 100000
> if loop body has many instructions, such large count may trigger
> hung_task panic.
>
Actually I'm planing use time-based time to avoid this, minor issue.

> 2.
> jumps are not counted, so if userspace makes an error and loads
> ktap bytecode with wrong jumps, the interpreter will hang.
>
There leave a todo in validation code, as kernel developers don't like
many todo in there, so I will also address it.

> 3.
> recursive functions and f1()->f2()->f1() are not detected either.
> Another possible hang?
>
No, it will exit by ktap stack overflow check.

> 4.
> bc_[ft]new instruction are allocating memory and garbage collector
> suppose to free things when ktap module is unloaded, right?
> since max_loop_cnt is 100k, a script can allocate quite a bit of memory
> and kernel will be waiting for userspace trigger to free it?
> Sounds dangerous.
>
There will have table/function number limitation, so this is not a problem.

> These concerns are just from quick code review.
>
>> 4). Samples
>> Many people like those ktap samples, ktap shows the attractive by
>> samples.
>>
>> Even I so love ktap and would like share ktap values to everyone, but in
>> technical point of view, I still agree with you that there should have
>> unified scripting engines in kernel if that engine can service for many
>> domains(like networking), but that solution should show its availability/
>> simplicity/safety firstly to user, not just proved by end user.
>>
>> Dynamic tracing scripting environment should contains:
>> simple compiler, clean language syntax, fast script engine,
>> associative array, aggregation, kstack, ustack, event management,
>> ring buffer, samples, tapset/library, CTF, etc.
>>
>> ktap already fixed most of these issues by its simple design, but
>> BPF only have "script engine" part(its associative array still cannot
>> vmalloc), which have long road before could use by end user.
>
> 'internal bpf' instruction set is an assembler instruction set.
> Low level just like x86 instruction set.
> It doesn't have vmalloc instruction and shouldn't have.
> 'internal bpf' program can theoretically make a call to allocate
> memory, but I don't think it's safe to let loadable programs to
> arbitrarily allocate memory.
> It's a matter of ownership of the memory.
> If script can allocate and receive a pointer to memory,
> the script owns that memory and kernel cannot touch it until
> script does 'free' or terminates and GC kicks in.

Wrong, ktap don't have vmalloc instruction, ktap only use
vmalloc for table and memory pool pre-allocation.

> ktap can be invoked through timers, so this dynamically
> allocated tables may be living for long time affecting the whole
> system. The tracing tool should be safer than that.
>
Wrong again, ktap table cannot be allocated in timer context.

>> ktap is not just bring a bytecode engine, it bring a complete simple
>> dynamic tracing environment to end user, it bring clean language syntax,
>> samples, flexible table, perf like event management, etc, those is the
>> key part to end user, not bytecode engine, so if we can develop simple
>> BPF compiler with similar ktap syntax in some day, then we can replace
>> kp_lex.c/kp_parse.c/kp_vm.c, and there have zero reason why other
>> parts cannot be shared(associative array, aggregation, kstack, ustack,
>> event management, ring buffer, samples, tapset/library, CTF, etc).
>
> I think nothing stops ktap userspace to parse ktap language
> and generate 'internal bpf' format. gcc is unnecessary here.

It's a big engineering problem, BPF bytecode is too low level,
BPF engine exposed too much low level stuff to end user, see bpf example:

void dropmon(struct bpf_context *ctx) {
void *loc;
uint64_t *drop_cnt;

loc = (void *)ctx->arg2;

drop_cnt = bpf_table_lookup(ctx, 0, &loc);
if (drop_cnt) {
__sync_fetch_and_add(drop_cnt, 1);
} else {
uint64_t init = 0;
bpf_table_update(ctx, 0, &loc, &init);
}
}

IMO there have many issues in this simple script.

If user forget add drop_cnt check, what will
happen, it will reference NULL pointer in __sync_fetch_and_add.
How to make sure drop_cn pointer is a valid memory address in table,
not other kernel memory allocation?

Look bpf_table_update function, if bpf table overflow, there have
no way to stop script executing in there, which make completely
wrong things, so you have to add exit condition checking after
bpf_table_update(and maybe most C function calls).

And obviously you missed add table lock/unlock in there.

In contrast, look ktap script with same functionality:

var s ={}

trace skb:kfree_skb {
s[arg2] += 1
}

User don't need to handle error checking and table lock issue at all,
both in source level and bytecode level.

>From end user point of view, they want clean language syntax like
above ktap example, so if bpf have same dynamic tracing goal, it
should follow this way.

BPF is good, but have many engineering problem to solve to
became usable dynamic tracing solution.

> I personally think that tracing scripts in C are more readable,
> but that's minor.

There will have no one end user use your dropmon.c in they
dynamic tracing environment.

> But before we go about generating either 'internal bpf' or any
> other format, we need to discuss safe scripting design principles.
> We already have systemtap that is relying on userspace for verification.
> If we want a real alternative to systemtap, kernel should take care
> of safety.
>
> Note I'm not proposing to expose 'internal bpf' to userspace in uapi
> headers and I think ktap shouldn't do it either.
> Kernel hosted userspace component (like perf) that uses
> kernel specific headers allows for much cleaner interfaces
> without creating 'forever maintain' headache.
>
Hmm, Looks reasonable.

Thanks.

Jovi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/