Re: [git pull] tracing fixes

From: Ingo Molnar
Date: Fri Jul 18 2008 - 04:42:20 EST



* Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:

> On Thu, 17 Jul 2008, Ingo Molnar wrote:
> >
> > Ingo Molnar (4):
> > ftrace: fix merge buglet
> > ftrace: fix lockup with MAXSMP
> > ftrace: do not trace scheduler functions
> > ftrace: do not trace library functions
> >
>
> [...]
> > --- a/kernel/Makefile
> > +++ b/kernel/Makefile
> > @@ -11,8 +11,6 @@ obj-y = sched.o fork.o exec_domain.o panic.o printk.o profile.o \
> > hrtimer.o rwsem.o nsproxy.o srcu.o semaphore.o \
> > notifier.o ksysfs.o pm_qos_params.o sched_clock.o
> >
> > -CFLAGS_REMOVE_sched.o = -mno-spe
> > -
> > ifdef CONFIG_FTRACE
> > # Do not trace debug files and internal ftrace files
> > CFLAGS_REMOVE_lockdep.o = -pg
> > @@ -21,6 +19,7 @@ CFLAGS_REMOVE_mutex-debug.o = -pg
> > CFLAGS_REMOVE_rtmutex-debug.o = -pg
> > CFLAGS_REMOVE_cgroup-debug.o = -pg
> > CFLAGS_REMOVE_sched_clock.o = -pg
> > +CFLAGS_REMOVE_sched.o = -mno-spe -pg
> > endif
> >
>
> Ingo,
>
> Why not trace the scheduler functions? I found a lot of useful
> information from seeing what functions are being called (namely the
> latencies caused by the fair scheduler balancing). Not being able to
> trace sched.c seems to keep a lot of useful data from being accessed.

i agree in general, but it was causing lockups with:

http://redhat.com/~mingo/misc/config-Thu_Jul_17_13_34_52_CEST_2008

note the MAXSMP in the config which sets NR_CPUS to 4096:

CONFIG_NR_CPUS=4096

our randconfig testing stumbled on it. That is a debug helper to "tune
up the kernel for as large systems as possible" and can bring in
regressions not normally seen.

after i spent a good 4 hours on figuring out the lib/*.o details i didnt
have the stamina to find the exact reason within sched.o :-)

One thing that needs looking at is that ftrace's self-recursion checks
are not as robust as they used to be, and this is a recent regression
(as in: last 1-2 weeks). Why do we have to exclude tsc.o from tracing
for example? Why isnt cpu_clock() called inside a recursion-protected
section? Why are all the trace function callbacks called outside of
recursion checks? Why arent ftrace lockups debuggable via the NMI
watchdog + early printk? I think it would be more robust to do a
recursion check ASAP.

> also, is the '-mno-spe' safe when ftrace is not configured?

Why was the -mno-spe added exactly? I havent seen it explained in the
commit that added its removal:

| commit 6ec562328fda585be2d7f472cfac99d3b44d362a
| Author: Steven Rostedt <rostedt@xxxxxxxxxxx>
| Date: Wed May 14 21:30:30 2008 -0400
|
| ftrace: use the new kbuild CFLAGS_REMOVE for kernel directory

it talks about a cleanup but also adds -mno-spe removal that wasnt there
before. This seems to be a powerpc special and the exact context is not
clear to me.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/