Re: [PATCH v2 00/15] tracing: 'hist' triggers

From: Tom Zanussi
Date: Mon Mar 02 2015 - 20:19:07 EST

On Mon, 2015-03-02 at 16:01 -0800, Alexei Starovoitov wrote:
> On Mon, Mar 2, 2015 at 11:55 AM, Tom Zanussi
> <tom.zanussi@xxxxxxxxxxxxxxx> wrote:
> >
> > I disagree that it would be rarely used. In fact, it would probably
> > cover about 80% of the use cases that people initially use things like
> > systemtap or dtrace for, which I guess is what ebpf is shooting for.
> 'hist' style won't solve any of the use cases I'm targeting with bpf.
> So, imo, 'hist' being 80% of dtrace is far from reality...
> but let's agree to disagree. it's not that important.
> I'm not saying don't do 'hist' at all.
> I'm only suggesting to do it differently.
> > I'm also looking at systems that have very little memory and 8Mb of
> > storage to work with, so streaming it all to userspace and
> > post-processing won't really work on those systems.
> I'm not suggesting to post-process. Quite the opposite.
> Let programs do ++ in the kernel, since that's what
> your patch 12 is doing, but in a hard coded way.
> > With some thought, though, I think the ebpf system/interpreter could be
> > made smart enough to recognize the simple patterns represented by the
> > hist triggers, and reuse them internally. So ftrace users get their
> > command-line version and it's also something ebpf can reuse.
> I'm saying keep the command line version of hist, but let
> user space process it.
> I don't buy the argument that you must run it in busybox
> without any extra tools.
> If you're in busybox, the system is likely idle, so nothing
> to trace/analyze. If you have some user space apps,
> then it equally easy to add 'hist->bpf' tool.

How about systems that run a single statically linked process with no
shell (but a service that can read and write files like/event/trigger
and event/hist)? We'd still like to be able to trace those systems.

> >> to embedded argument. Why add this to kernel
> >> when bpf programs can do the same on demand?
> >
> > Because this demonstrates that all those things can be done without
> > introducing an interpreter into the mix, so why bother with the
> > interpreter?
> because interpreter is done once for all use cases,
> whereas custom 'hist' code is doing only one thing for one use case.

I agree that the hist functionality is a subset of what can be done with
a full-blown interpreter, but it's not doing just one thing for one use
case - it covers a whole set of use cases.

> >> the kernel ABI exposure is much smaller.
> >> So would you consider working together on adding
> >> clean bpf+tracepoints infra and corresponding
> >> user space bits?
> >> We can have small user space parser/compiler for
> >> 'hist:keys=common_pid.execname,id.syscall:vals=hitcount'
> >> strings that will convert it into bpf program and you'll
> >> be able to use it in embedded setups ?
> >
> > Yeah, wouldn't be averse to working together to create a clean bpf
> > +tracepoints infrastructure - I think creating a reusable component like
> > this would be a good first step.
> great.
> From the program you can emit the same text format
> as in your 'cat hist' example.
> But it will not be a part of stable kernel ABI, which I think is
> one of the main advantages to do such printing from programs
> instead of kernel C code.
> If you decide to extend what is being printed, you can
> tweak 'hist->bpf' tool and print something else.
> No one will complain, whereas when you would want
> to extend the format of 'hist' file printed by kernel, you'd
> need to consider all user tools that are parsing it.
> Like we saw in systrace example...
> > BTW, I've actually tried to play around with the BPF samples/, but it
> > seems they're not actually hooked into the system i.e. the samples
> > Makefile doesn't build them, and it even looks for tools/llvm that's not
> > there. I got as far as getting the latest llvm from the location
> > mentioened in one of the bpf commit messages, but gave up after it told
> > me 'invalid target bpf'. And I couldn't find any documentation on how
> > to set it all up - did I just miss that?
> the comment next to 'tool/llvm' says 'point to your llvm' :)
> so yes, to build C examples one need to install latest llvm trunk.
> If you're saying that existing bpf stuff is hard to use, then yes.

Well, I'd say writing BPF 'assembly' to do anything isn't something more
than a few users in the world would even consider, so that's completely
out. Which means the only practical way to use it is via the C
interface. But getting that set up properly doesn't seem
straightforward either - it isn't something the Makefile will help with,
and there's no documentation on how one might do it.

So I tweaked the Makefile to get samples/bpf in the build (I mean the
directory is there under samples/, so why do I need to add it to the
Makefile myself?) and tried building which failed until I tweaked
something else to get it to find the right headers, etc. Finally I got
it building the userspace stuff but then found out I needed my own llvm
to get the kernel modules built, so searched and found your llvm tree
which I thought would configure the bpf backend automatically, but
apparently not, since it then failed with llc: invalid target 'bpf'
which is where I gave up. Do I need to configure with --target=bpf or
something like that? I don't know, and know nothing about llvm, so am
kind of stuck.

I really do want to try doing something with it, and I understand that
you're working on improving the user experience, but at this point it
seems users have to jump through a lot of hoops just to get a minimally
working setup. Even a small paragraph with some basic instructions
would help. Or maybe it's just me, and it works for everyone else out
of the box.


> I completely agree. It is hard to use. We're working on it.
> The user bits can be improved gradually unlike kernel/user
> boundary. Once you set it to be 'hist' file format it will stay
> forever.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at