Re: [PATCH 08/12] add trace events for each syscall entry/exit

From: Frederic Weisbecker
Date: Tue Aug 25 2009 - 20:19:18 EST


On Tue, Aug 25, 2009 at 03:51:11PM -0400, Mathieu Desnoyers wrote:
> * Frederic Weisbecker (fweisbec@xxxxxxxxx) wrote:
> > On Tue, Aug 25, 2009 at 02:31:19PM -0400, Mathieu Desnoyers wrote:
> > > (Well, I do not have time currently to look into the gory details
> > > (sorry), but let's try to take a step back from the problem.)
> > >
> > > The design proposal for this kthread behavior wrt syscalls is based on a
> > > very specific and current kernel behavior, that may happen to change and
> > > that I have actually seen proven incorrect. For instance, some
> > > proprietary Linux driver does very odd things with system calls within
> > > kernel threads, like invoking them with int 0x80.
> > >
> > > Yes, this is odd, but do we really want to tie the tracer that much to
> > > the actual OS implementation specificities ?
> >
> >
> > I really can't see the point in doing this. I don't expect the kernel
> > behaviour to change soon and have explicit syscalls interrupts done
> > from it. It's not about a current kernel implementation fashion,
> > it's about kernel design sanity that is not likely to go backward.
> >
> > Is it worth it to trace kernel threads, maintain their tracing
> > specificities (such as workarounds with ret_from_fork that implies)
> > just because we want to support tracing on some silly proprietary drivers?
> >
> >
> > >
> > > That sounds like a recipe for endless breakages and missing bits of
> > > instrumentation.
> > >
> > > So my advice would be: if we want to trace the syscall entry/exit paths,
> > > let's trace them for the _whole_ system, and find ways to make it work
> > > for corner-cases rather than finding clever ways to diminish
> > > instrumentation coverage.
> >
> >
> > If developers of out of tree drivers want to implement buggy things
> > that would never be accepted after a minimal review here, and then instrument
> > their bugs, then I would suggest them to implement their own ad hoc instrumentation,
> > really :-/
> >
> > What's the point in supporting out of tree bugs?
> >
> > Well, the only advantage of doing this would be to support reverse engineering
> > in tiny and rare corner cases. Not that worth the effort.
> >
> >
> > > Given the ret from fork example happens to be the first event fired
> > > after the thread is created, we should be able to deal with this problem
> > > by initializing the thread structure used by syscall exit tracing to an
> > > initial "ret from fork" value.
> > >
> > > Mathieu
> >
> >
> > It means we have to support and check this corner case in every archs
> > that support syscall tracing, deal with crashes because we omitted it, etc...
> >
> > For all the things I've explained above I don't think it's worth the effort.
> >
> > But it's just my opinion...
> >
>
> Then we might want to explicitly require that calls to sys_*() system
> calls made from within the kernel pass through another instrumentation
> mechanism. IMHO, that would make sense. It would cover both system calls
> made from kernel threads and system calls made from within a system call
> or trap.
>
> Mathieu


Well, we can't really set a tracepoint per sys_*() function. Or more
precisely we already have them, automagically generated and relying on
sysenter ptrace path.

But if we want to check which syscalls are called from kernel threads, we have:

- kthread() -> do_exit()


The entry point of every kernel threads (except "kthreadd") is
kthread(). It calls do_exit() in the end.

If we want to trace the exit of a kernel thread, we can put
a tracepoint there instead of do_exit() which results would
be intermixed with sys_exit() tracing.


- kthreadd :: create_kthread() -> kernel_thread() -> do_fork()


A creation of a thread is the result of the kthreadd thread fork().
If we want to trace the creation of kernel threads, we can again do that
in the upper level: kernel_thread().

But does that inform us about who created the thread? All we would see
is kthreadd that forks. This is a very poor information compared
to a userspace fork() that tells us who really created the new process.

Instead what we want is probably to trace kthread_create() which inserts the
job of a thread creation in the kthreadd thread, so that we know
_who_ asked for this thread creation (process that requested it and callsite).
And that's much more rich in information.

Well, you can even climb in an upper layer and look if this is a workqueue,
a kernel/async.c thread, a slow work, etc...


- kernel_execve() -> sys_execve()

We can execute user apps from kernel through call_usermodehelper().
And we can trace kernel_execve() or again in an upper layer
like call_usermodehelper()

- ... I guess there are other examples

The kernel calls syscalls through wrappers, and tracing these wrappers,
depending of the desired level of informations we want (choose your layer),
are much more verbose / rich in informations.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/