Re: [RFC] convert ftrace syscall tracer to TRACE_EVENT()

From: Roland McGrath
Date: Mon May 11 2009 - 22:47:44 EST


> Firstly, it adds two new tracepoints to every system call. That is
> unnecessary - we already have the TIF flag based callbacks, and we
> can use the existing syscall attributes table to get to tracepoints
> - without slow down (or impacting) the fast path in any way.

This is one of the key differences of this approach. It has very different
trade-offs. I'm afraid that you might be sweeping this issue under the rug
inadvertently. I see two major thrusts of Jason's proposal, and I think we
should be clear about each of those on its own separate merits.

#1 is the mechanism for getting to a tracing path.

If you use TIF_SYSCALL_TRACE (or new equivalents) then this is a choice you
make for the task (or all tasks, or whichever subset you choose). This
means every system call in that task takes the slow path for tracing.
(The slow path is slow primarily to enable fetching and changing all user
registers, which is not needed for just tracing syscall arguments/results.)

Conversely, an actual tracepoint in a syscall function or its wrapper
always affects every task, but only affects that particular syscall's code
path. If the tracepoint on sys_reboot is enabled, that has no effect
whatsoever on the paths taken by any task's sys_read calls. OTOH, if the
sys_read tracepoint is enabled (with whatever filtering), that makes each
and every sys_read call by each and every task go through the tracepoint
callback path. The "collateral damage" overhead paid by "uninteresting"
tasks (whose tracepoint hits are all filtered out) is whatever cost the
filtering code has.

#2 is the richness of the method for handling syscall arguments.

(I have the impression this one was Jason's motivation.) The new(ish)
syscall definition macros make it easy(ish) to pull out parameter types and
names statically at kernel build time, and do intelligent things with
those. As Jason is already looking into in his second pass, you can find a
way to exploit this either with direct tracepoints, or via syscall register
values fetched with syscall_get_arguments().


Thanks,
Roland
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/