Re: [PATCH] markers-linker-generic

From: Mathieu Desnoyers
Date: Wed Apr 11 2007 - 13:51:29 EST


* Russell King (rmk+lkml@xxxxxxxxxxxxxxxx) wrote:
> On Tue, Apr 10, 2007 at 06:36:58PM -0400, Mathieu Desnoyers wrote:
> > Defines the linker marcro EXTRA_RWDATA for marker data section. It
> > puts the marker data in a separate section that will not pollute the
> > normal .data section, which minimize the cache impact. Markers need such
> > a special section because they define a lot of spreaded and small data
> > structures at multiple sites.
> >
> > This patch also creates the __markers_strings section (ro marker
> > strings) and makes sure the __markers section is aligned by putting it
> > before the __ksymtab_strings (not after).
>
> What's this marker stuff about?
>

Hi Russel,

Here is an overview :

This marker infrastructure is a hook-callback mechanism. It is
meant to have an impact as low as possible on the system performances
when no callback (probe) is connected so markers (hooks) can be
compiled into a production kernel without noticeable slowdown.

The rationale behind this mechanism the following :
1 - It makes sense to have instrumentation (for tracing, profiling)
within the kernel source tree so that it can follow its evolution.
Other options, such as kprobes, imply maintaining an external set of
instrumentation that must be adapted to each kernel version.
Although it may make sense for distributions, it is not well suited
for kernel developers, since they rarely work on a major
distribution image.
2 - kprobes, although being a very good attempt at providing a dynamic
hooking mechanism that has no impact when disabled, suffers from
important limitations :
a - It cannot access local variables of a function at a particular
point within its body that will be consistent thorough the kernel
versions without involving a lot of recurrent hair-pulling.
b - Kprobes is slow, since it involves going though a trap each time
a probe site is executed. Even though the djprobes project made a
good effort to make things faster, it cannot currently instrument
fully-preemptible kernels and does not solve (1), (2a) and (2c).
c - On the reentrancy side, going though a trap (thus playing with
interrupt enable/disable) and taking spinlocks are not suited to
some code paths, i.e. :
kernel/lockdep.c, printk (within the lockdep_on()/lockdep_off()).
It must be understood that some code paths interesting for
instrumentation often present a particular reentrancy challenge.

Some more details :

The probe callback connection to its markers is done dynamically. A
predicted branch is used to skip the hook stack setup and function call
when the marker is "disabled" (no probe is connected). Further
optimizations can be implemented for each architecture to make this
branch faster.

Instrumentation of a subsystem becomes therefore a straightforward task.
One has to add instrumentation within the key locations of the kernel
code in the following form :

MARK(subsystem_event, "%d %p", myint, myptr);

I will be glad to discuss more specific questions that may arise about
the markers.

Regards,

Mathieu


--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/