Re: [PATCH v2 00/15] tracing: 'hist' triggers
From: Alexei Starovoitov
Date: Mon Mar 02 2015 - 19:02:17 EST
On Mon, Mar 2, 2015 at 11:55 AM, Tom Zanussi
<tom.zanussi@xxxxxxxxxxxxxxx> wrote:
>
> I disagree that it would be rarely used. In fact, it would probably
> cover about 80% of the use cases that people initially use things like
> systemtap or dtrace for, which I guess is what ebpf is shooting for.
'hist' style won't solve any of the use cases I'm targeting with bpf.
So, imo, 'hist' being 80% of dtrace is far from reality...
but let's agree to disagree. it's not that important.
I'm not saying don't do 'hist' at all.
I'm only suggesting to do it differently.
> I'm also looking at systems that have very little memory and 8Mb of
> storage to work with, so streaming it all to userspace and
> post-processing won't really work on those systems.
I'm not suggesting to post-process. Quite the opposite.
Let programs do ++ in the kernel, since that's what
your patch 12 is doing, but in a hard coded way.
> With some thought, though, I think the ebpf system/interpreter could be
> made smart enough to recognize the simple patterns represented by the
> hist triggers, and reuse them internally. So ftrace users get their
> command-line version and it's also something ebpf can reuse.
I'm saying keep the command line version of hist, but let
user space process it.
I don't buy the argument that you must run it in busybox
without any extra tools.
If you're in busybox, the system is likely idle, so nothing
to trace/analyze. If you have some user space apps,
then it equally easy to add 'hist->bpf' tool.
>> to embedded argument. Why add this to kernel
>> when bpf programs can do the same on demand?
>
> Because this demonstrates that all those things can be done without
> introducing an interpreter into the mix, so why bother with the
> interpreter?
because interpreter is done once for all use cases,
whereas custom 'hist' code is doing only one thing for one use case.
>> the kernel ABI exposure is much smaller.
>> So would you consider working together on adding
>> clean bpf+tracepoints infra and corresponding
>> user space bits?
>> We can have small user space parser/compiler for
>> 'hist:keys=common_pid.execname,id.syscall:vals=hitcount'
>> strings that will convert it into bpf program and you'll
>> be able to use it in embedded setups ?
>
> Yeah, wouldn't be averse to working together to create a clean bpf
> +tracepoints infrastructure - I think creating a reusable component like
> this would be a good first step.
great.
>From the program you can emit the same text format
as in your 'cat hist' example.
But it will not be a part of stable kernel ABI, which I think is
one of the main advantages to do such printing from programs
instead of kernel C code.
If you decide to extend what is being printed, you can
tweak 'hist->bpf' tool and print something else.
No one will complain, whereas when you would want
to extend the format of 'hist' file printed by kernel, you'd
need to consider all user tools that are parsing it.
Like we saw in systrace example...
> BTW, I've actually tried to play around with the BPF samples/, but it
> seems they're not actually hooked into the system i.e. the samples
> Makefile doesn't build them, and it even looks for tools/llvm that's not
> there. I got as far as getting the latest llvm from the location
> mentioened in one of the bpf commit messages, but gave up after it told
> me 'invalid target bpf'. And I couldn't find any documentation on how
> to set it all up - did I just miss that?
the comment next to 'tool/llvm' says 'point to your llvm' :)
so yes, to build C examples one need to install latest llvm trunk.
If you're saying that existing bpf stuff is hard to use, then yes.
I completely agree. It is hard to use. We're working on it.
The user bits can be improved gradually unlike kernel/user
boundary. Once you set it to be 'hist' file format it will stay
forever.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/