Re: [PATCH v2 00/15] tracing: 'hist' triggers

From: Tom Zanussi
Date: Mon Mar 02 2015 - 14:55:41 EST

Hi Alexei,

On Mon, 2015-03-02 at 11:14 -0800, Alexei Starovoitov wrote:
> On Mon, Mar 2, 2015 at 8:00 AM, Tom Zanussi <tom.zanussi@xxxxxxxxxxxxxxx> wrote:
> >
> > # echo 'hist:keys=common_pid.execname,id.syscall:vals=hitcount' > \
> > /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/trigger
> >
> > # cat /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/hist
> >
> > key: common_pid:bash[3112], id:sys_write vals: count:69
> > key: common_pid:bash[3112], id:sys_rt_sigprocmask vals: count:218
> Hi Tom,
> I think we both want to see in-kernel aggregation.
> This 'hist' stuff is trying to do counting and even map sorting
> in the kernel, whereas with bpf programs I'm moving
> all of these decisions to user space.
> I understand your desire to avoid any user level scripts
> and do everything via 'cat' and debugfs, but imo that's
> very limiting. I think it's better to do slim user space

It's consistent with the whole host of other ftrace tools, so I don't
know why that model would be any more limiting for this.

> scripting language that can translate to bpf even in
> embedded setups. Then users will be able to aggregate
> whatever they like, whereas with 'hist' approach
> they're limited to simple counters.
> trace_events_trigger.c - 1466 lines - that's quite a bit
> of code that will be rarely used. Kinda goes counter

I disagree that it would be rarely used. In fact, it would probably
cover about 80% of the use cases that people initially use things like
systemtap or dtrace for, which I guess is what ebpf is shooting for.

I'm also looking at systems that have very little memory and 8Mb of
storage to work with, so streaming it all to userspace and
post-processing won't really work on those systems.

With some thought, though, I think the ebpf system/interpreter could be
made smart enough to recognize the simple patterns represented by the
hist triggers, and reuse them internally. So ftrace users get their
command-line version and it's also something ebpf can reuse.

> to embedded argument. Why add this to kernel
> when bpf programs can do the same on demand?

Because this demonstrates that all those things can be done without
introducing an interpreter into the mix, so why bother with the

> Also the arguments about stable ABI apply as well.
> The format of 'hist' file would need to be stable, so will
> be hard to extend it. With bpf programs doing aggregation

Well, the format is very regular - keys and values and summary lines,
not much to break there.

> the kernel ABI exposure is much smaller.
> So would you consider working together on adding
> clean bpf+tracepoints infra and corresponding
> user space bits?
> We can have small user space parser/compiler for
> 'hist:keys=common_pid.execname,id.syscall:vals=hitcount'
> strings that will convert it into bpf program and you'll
> be able to use it in embedded setups ?

Yeah, wouldn't be averse to working together to create a clean bpf
+tracepoints infrastructure - I think creating a reusable component like
this would be a good first step.

BTW, I've actually tried to play around with the BPF samples/, but it
seems they're not actually hooked into the system i.e. the samples
Makefile doesn't build them, and it even looks for tools/llvm that's not
there. I got as far as getting the latest llvm from the location
mentioened in one of the bpf commit messages, but gave up after it told
me 'invalid target bpf'. And I couldn't find any documentation on how
to set it all up - did I just miss that?


> Thanks

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at