Re: [RFC PATCH 16/22 -v2] add get_monotonic_cycles

From: Mathieu Desnoyers
Date: Fri Jan 18 2008 - 17:26:53 EST


* Steven Rostedt (rostedt@xxxxxxxxxxx) wrote:
>
> On Thu, 17 Jan 2008, Frank Ch. Eigler wrote:
>
> > Hi -
> >
> > On Thu, Jan 17, 2008 at 03:08:33PM -0500, Steven Rostedt wrote:
> > > [...]
> > > + trace_mark(kernel_sched_schedule,
> > > + "prev_pid %d next_pid %d prev_state %ld",
> > > + prev->pid, next->pid, prev->state);
> > > [...]
> > > But...
> > >
> > > Tracers that want to do a bit more work, like recording timings and seeing
> > > if we hit some max somewhere, can't do much with that pretty print data.
> >
> > If you find yourself wanting to perform computations like finding
> > maxima, or responding right there as opposed to later during userspace
> > trace data extraction, then you're trending toward a tool like
> > systemtap.
>
> Yes, very much so. I'm working on getting the latency_tracer from the -rt
> patch into something suitable for mainline. We need to calculate the max
> latencies on the fly. If we hit a max, then we save it off, otherwise, we
> blow away the trace and start again.
>
> >
> > > [...]
> > > So, at a minimum, I'd like to at least have meta data attached:
> > > trace_mark(kernel_sched_schedule,
> > > "prev_pid %d next_pid %d prev_state %ld\0"
> > > "prev %p next %p",
> > > prev->pid, next->pid, prev->state,
> > > prev, next);
> > >
> > > This would allow for both the nice pretty print of your trace, as well as
> > > allowing other tracers to get to better meta data.
> >
> > Yes, more self-contained marker events are necessary for meaningful
> > in-situ processing. That needs to be balanced by the increased cost
> > for computing and passing the extra parameters, multiplied the event
> > occurrence rate.
>
> The cost is only done when the marker is armed. Since the marker is an
> unlikely, and will be placed at the end of the function.
>
> >
> > In this case, the prev/next pointers are sufficient to compute the
> > other values. For particularly performance-critical markers, it may
> > not be unreasonable to expect the callback functions to dereference
> > such pointers for pretty-printing or other processing.
>
> This was exactly my point to Mathieu, but I think he has LTTng very much
> coupled with the markers. I haven't played with LTTng (yet), but from what
> I've read (Mathieu, correct me if I'm wrong), it seems that all the
> markers become visible to userspace, and the user can simple turn them on
> or off. LTTng doesn't need any knowledge of the marker since the marker
> contains how to print the information.
>
> So* by placing a "prev %p next %p" as the only information, we lose out on
> this automated way LTTng works. Because the two pointers are just
> meaningless numbers to the user.
>

Exactly. We have, at the marker site :

- a marker identifier
- format string containing field names and types
- arguments

I would like to keep that as much in a straight line as possible with
what ends up in the trace.

However, I see that it limits what can be done by in-kernel tracers. And
by the way, I also suffer from the same kind of limitation in LTTng. Here
is an example :

I would like to replace blktrace (actually, I already have a quite
complete implementation). However, there is some code ran in the kernel
to "prepare" the information for the trace which is blktrace specific.
Since this code is not required to run when tracing is disabled, it can
be seen as "glue-code" between the kernel tracing point and the
extraction of data to trace.

What looked like the less intrusive solution was to create inline
functions that consist of branches over code considered unlikely (could
be a function call) where the glue-code is executed to prepare the data.
It's a bit like what the markers are doing, except that there is no
marker name associated and no format string : the subsystem being traced
must enable its tracing features by itself (could be a /proc file). It
makes sense, since this type of code has to be subsystem-specific
anyway.

But I have not seen a lot of situations where that kind of glue-code was
needed, so I think it makes sense to keep markers simple to use and
efficient for the common case.

Then, in this glue-code, we can put trace_mark() and calls to in-kernel
tracers.

Since the markers are eventually meant to become an API visible from
user-space, I think it makes sense to keep it clean. If an in-kernel
tracer needs extra information, I think it would make sense for it to
get it from a mechanism that does not make the exported information
visible to user-space.

What do you think ?


> >
> > > The '\0' would keep your tracer from recording the extra data, and we
> > > could add some way to ignore the parameters in the printf to let other
> > > traces get straight to the meta data.
> >
> > This \0 hack is perhaps too clever. Much of the cost of the extra
> > parameters is already paid by the time that a simpleminded tracing
> > callback function starts going through the string. Also, I believe
> > the systemtap marker interface would break if the format strings were
> > not singly terminated ordinary strings.
>
> Well, actually when I first wrote this letter, I used "--" as a delimiter
> to allow a tool to hide the pretty stuff. But then I thought about the
> "clever hack" with the '\0', The "--" may be better since it wont break
> systemtap.
>

It could be done I guess. But it looks a bit ugly. :) I would rather
prefer to export the "pretty stuff" through an interface not involving
markers. Or if there is a way to separate the "callback" mechanism from
the "export to user-space" API parts of the markers, I am open to
proposals.

Mathieu

> -- Steve
>
> * dvhart - bah!
>

--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/