Re: [RFC] convert ftrace syscall tracer to TRACE_EVENT()

From: Frédéric Weisbecker
Date: Sat May 09 2009 - 08:54:15 EST


2009/5/9 Ingo Molnar <mingo@xxxxxxx>:
>
> * Jason Baron <jbaron@xxxxxxxxxx> wrote:
>
>> Hi,
>>
>> I've been thinking about converting the current ftrace syscall
>> tracer to the TRACE_EVENT() macros. There are a few issues with
>> the current syscall tracer approach:
>>
>> 1) It has to be enabled for all processes and all syscalls. By
>> moving to TRACE_EVENT(), it can be enabled/disabled per tracepoint
>> and can also make use of the generic tracing filters, such as
>> "trace all process for pid x"
>>
>> 2) Other tracers can not tie into it, since its not tracepoint
>> based. TRACE_EVENT() fixes this.
>>
>> 3) data formatting. The syscall tracer I don't believe understands
>> all the various types for output formatting. By moving to
>> TRACE_EVENT(), we can print out a more readible syscall trace.
>>
>> 4) The ftrace syscall tracer needs a new arch specific code for
>> each architecture. By converting to TRACE_EVENT() we don't need
>> any architecutre specific code.
>>
>> Other issues to consider:
>>
>> * Maintainence. The current syscall tracer automatically picks up
>> new syscalls. The TRACE_EVENT() will be harder to initially set
>> up. But once its done, syscalls are obviously not added often. So
>> I don't think this will be too bad.
>>
>> * Performance. The current syscall tracer adds a
>> 'test_thread_flag()' to syscall entry/exit. The TRACE_EVENT()
>> would add a per-syscall global to check. So they are going to have
>> different cache profiles...however, the tracepoint infrastructure
>> is hopefully moving to the 'immediate' value work, which will make
>> this more highly optimized.
>>
>> I've also tested the patch shown below (which uses,
>> DECLARE_TRACE(), as a preliminary proof of concept), using
>> getpid() in a loop, and tbench, and saw very small performance
>> differences. Obviously we would have to do more extensive testing
>> before deciding.
>>
>> Patch is pretty rough, but should give a rough sense of what the
>> DECLARE_TRACE() type patch might look like...
>
> Yeah, i very much agree with the direction. (I've Cc:-ed Tom Zanussi
> who also has expressed interest in this.)
>
> I'm not sure about the implementation as you've posted it though:
>
> Firstly, it adds two new tracepoints to every system call. That is
> unnecessary - we already have the TIF flag based callbacks, and we
> can use the existing syscall attributes table to get to tracepoints
> - without slow down (or impacting) the fast path in any way.



Agreed, that's unnecessary because we already hook in ptrace without
impacting the off-case.



> Secondly, we should reuse the information we get in SYSCALL_DEFINE,
> to construct the TRACE_EVENT tracepoints directly - without having
> to list all syscalls again in a separate file.


Indeed, that's not trivial though, but feasible.
I'm not sure we can reuse the TRACE_EVENT macro directly inside SYSCALL_DEFINE.
The resulting macro tempest effect that would occur confuses me and I
have troubles to imagine the result.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/