Re: [RFC PATCH 3/3] kernel tracing prototype

From: Jiaying Zhang
Date: Fri Dec 05 2008 - 03:05:11 EST


To better answer the question why we want to implement a new kernel
tracing prototype, here are some performance results we collected before
with the tbench benchmark. We were considering to use the existing
marker code then and had built a simple prototype that allows us to measure
the performance overhead from different layers.

- vanilla 2.6.26 kernel, CONFIG_MARKERS=n
Throughput 759.352 MB/sec 4
- markers compiled in, tracing disabled
Throughput 754.18 MB/sec 4
- tracing syscall entry/exit, use markers, not logging data to ring_buffer
Throughput 715.68 MB/sec 4
- tracing syscall entry/exit, use markers, logging data to ring_buffer
Throughput 654.056 MB/sec 4
- tracing syscall entry/exit, bypass markers, not logging data to ring_buffer
Throughput 735.575 MB/sec 4
- tracing syscall entry/exit, bypass markers, logging data to ring_buffer
Throughput 667.68 MB/sec 4

We found that the marker layer alone adds about 3% overhead while
tracing is enabled. This overhead is mostly from the extra code and
function calls added by the markers code. In our use case, we don't
need several features in the current marker code, like the support for
multiple probe functions and the use of va_arg to allow arbitrary trace
format, but we have very strict performance requirements. So we
implemented our own trace instrumentation code that has many
similarities as the markers but is simpler and thus we think is easier
to be made more efficient.

We also found that logging data to the unified trace buffer adds about
10% overhead. After taking a close look, we found that there are two
extra overheads. One is the extra function calls introduced by the
unified trace buffer. That is why we are considering to add the inline
trace write functions in the unified trace buffer. The other is the overhead
on obtaining the timestamps. I saw that the unified trace buffer code
uses sched_clock to obtain timestamps. For some reason, this function
alone adds about 2~3% overhead on my testing machine.

The above numbers were measured on one multi-core system.
I am going to run more performance tests on more systems and will
keep you updated.

Jiaying

On Tue, Dec 2, 2008 at 9:30 PM, Jiaying Zhang <jiayingz@xxxxxxxxxx> wrote:
>
> On Tue, Dec 2, 2008 at 8:35 PM, Steven Rostedt <srostedt@xxxxxxxxxx> wrote:
> >
> > On Tue, 2008-12-02 at 16:26 -0800, Jiaying Zhang wrote:
> >> Refer to the previous email
> >> "[RFC PATCH 0/3] A couple of feature requests to the unified trace
> >> buffer".
> >>
> >> A kernel tracing prototype that uses the unified trace buffer to keep
> >> the collected trace data.
> >>
> >
> >> Index: linux-2.6.26.2/include/linux/ktrace.h
> >> ===================================================================
> >> --- /dev/null 1970-01-01 00:00:00.000000000 +0000
> >> +++ linux-2.6.26.2/include/linux/ktrace.h 2008-11-25
> >> 11:32:37.000000000 -0800
> >> @@ -0,0 +1,73 @@
> >> +#ifndef _LINUX_KTRACE_H
> >> +#define _LINUX_KTRACE_H
> >> +
> >> +#include <linux/types.h>
> >> +
> >> +struct kernel_trace;
> >> +
> >> +typedef void ktrace_probe_func(struct kernel_trace *, void *,
> >> size_t);
> >> +
> >> +struct kernel_trace {
> >> + const char *name;
> >> + const char *format; /* format string describing variable
> >> list */
> >> + u32 enabled:1, event_id:31;
> >> + ktrace_probe_func *func; /* probe function */
> >> + struct list_head list; /* list head linked to the hash table
> >> entry */
> >> +} __attribute__((aligned(8)));
> >> +
> >> +extern int ktrace_enabled;
> >> +
> >> +/*
> >> + * Make sure the alignment of the structure in the __ktrace section
> >> will
> >> + * not add unwanted padding between the beginning of the section and
> >> the
> >> + * structure. Force alignment to the same alignment as the section
> >> start.
> >> + */
> >> +
> >> +#define DEFINE_KTRACE_STRUCT(name) struct ktrace_struct_##name
> >> +
> >> +#ifdef CONFIG_KTRACE
> >> +#define DO_TRACE(name, format, args...)
> >> \
> >> + do {
> >> \
> >> + static const char __kstrtab_##name[]
> >> \
> >> + __attribute__((section("__ktrace_strings")))
> >> \
> >> + = #name "\0" format;
> >> \
> >> + static struct kernel_trace __ktrace_##name
> >> \
> >> + __attribute__((section("__ktrace"), aligned(8))) =
> >> \
> >> + { __kstrtab_##name, &__kstrtab_##name[sizeof(#name)],
> >> \
> >> + 0, 0, NULL, LIST_HEAD_INIT(__ktrace_##name.list) };
> >> \
> >> + __ktrace_check_format(format, ## args);
> >> \
> >> + if (unlikely(ktrace_enabled) &&
> >> \
> >> + unlikely(__ktrace_##name.enabled))
> >> { \
> >> + struct ktrace_struct_##name karg = { args };
> >> \
> >> + (*__ktrace_##name.func)
> >> \
> >> + (&__ktrace_##name, &karg, sizeof(karg));
> >> \
> >> + }
> >> \
> >
> > This looks like another form of markers/trace-points. Why not use them?
> Yes. It borrows many ideas from the marker code. But we don't intend
> to be as general as markers and lttng, which allows us to make many
> simplifications and we think will give us performance benefits.
>
> Jiaying
> >
> >> + } while (0)
> >> +
> >> +#else /* !CONFIG_KTRACE */
> >> +#define DO_TRACE(name, format, args...)
> >> +#endif
> >> +
> >
> > -- Steve
> >
> >
> >
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/