Re: [RFC] convert ftrace syscall tracer to TRACE_EVENT()

From: Mathieu Desnoyers
Date: Sat May 09 2009 - 11:24:49 EST


* Frédéric Weisbecker (fweisbec@xxxxxxxxx) wrote:
> 2009/5/9 Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxx>:
> > * Ingo Molnar (mingo@xxxxxxx) wrote:
> >>
> >> * Frédéric Weisbecker <fweisbec@xxxxxxxxx> wrote:
> >>
> >> > > I would expect to use copy_string_from_user (for strings) and
> >> > > copy_from_user for structures, because without any strings
> >> > > (especially), the trace information become much less useful.
> >> >
> >> > Yeah, for structures we would just need the copy_from_user.
> >>
> >> There's just a few places (mainly related to VFS APIs) where we
> >> really want to do that, and there we want to do it a bit later, not
> >> at syscall time: we want to do it after the getname(), to output a
> >> stable (and already copied to kernel space) copy of the file name.
> >>
> >> So the right solution there would be to add special, case by case
> >> tracepoints to those few places. We dont need strings for the
> >> majority of the 300+ system calls that exist on Linux.
> >>
> >>       Ingo
> >
> > Hrm, this is an important design decision.. I cover a lot of those sites
> > in my LTTng instrumentation, and this is clearly one way to do it, at
> > the expense of adding tracepoints in many kernel locations when there
> > could be a functionnal equivalent with syscall instrumentation.
>
>
> Yeah, these tracepoints defined from DEFINE_SYSCALL are a good way
> to proceed generically.
> For specific cases, we can later add some upper layer, such as described below.
>
>
> > The thing we would need to do it from the syscall tracing site is a
> > table to map the system call numbers to their specific types (for the
> > syscalls we care about) and therefore which would also map to a
> > serialisation function to extract the parameters and write the correct
> > content into the trace buffers.
>
>
> I would rather see this not using the syscalls as a key but the type
> of a parameter.
> We can find a same specific complex type used by several syscalls.
>

Agreed.

> If we want even better precision, we can also pair that with syscalls
> mapping for specific post-computing in output time. As an exemple to
> print O_RDONLY instead of the matching number.
>

Yep.

>
> >
> > We could also use getname()/putname() in the syscall tracing primitive.
> > Note that architectures like x86 64 needs some tweaks I have in my
> > patchset to correctly ensure that syscall entry/exit are always paired.
> > This is required because we change the thread flag synchronously with
> > thread execution upen activation/deactivation.
>
>
> Not sure I understand your point here. The only resulting problem of such
> race would be rare unpaired syscall exit or entry traces... Is it that
> much important?
>

If we have non-symmetric getname()/putname(), it will cause bogus ref
counting, and will leak memory.

Having non-matching syscall entry/exit is OK as long as tracing has no
side-effect on the rest of the kernel (e.g. only using local variables).
If we start playing with getname/putname for synchronization, we have to
be extra careful, because we start modifying external state.

Mathieu


--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/