Re: [PATCH 00/16] [GIT PULL] tracing: fixes/cleanups, nostop-machine, update stack tracer
From: Steven Rostedt
Date: Mon Jan 02 2012 - 14:33:25 EST
Hi Ingo,
Thanks for looking into this.
On Sun, 2012-01-01 at 19:09 +0100, Ingo Molnar wrote:
> Hm, i'm seeing spontaneous reboot crashes with these commits
> applied. The reboots just happen out of the blue during an
> allyesconfig bootup (config attached), and they happen at
> random places.
Interesting. This patch set only deals with function tracing (and stack
tracer which is on top of the function tracer). It shouldn't affect
anything if function tracing is never set. With a few exceptions...
>
> In about 50% of the cases the kernel boots up fine - so
> bisection is very difficult - i have to repeat the bootup 10
> times to have a reasonable confidence during a 5-step bisection
> process.
>
> I've narrowed it down the commits in the above tree:
>
> 38b78eb85540: tracing: Factorize filter creation
This one just touches the filter code. Unless you are doing filtering,
this code should never be executed. Are you running perf, as I believe
perf can trigger it too.
> 762e1207889b: tracing: Have stack tracing set filtered functions at boot
This one just adds a kernel command line that allows stack trace
filtering to be done at boot up. Do you have any function tracing things
enabled in your kernel command line?
> 2a85a37f168d: ftrace: Allow access to the boot time function enabling
This one just changes a name of a function and makes it global.
> d2d45c7a03a2: tracing: Have stack_tracer use a separate list of functions
This one touches the way stack tracer sets up what functions to trace,
and should not be touched if stack tracer is never updated.
> 69a3083c4a7d: ftrace: Decouple hash items from showing filtered functions
This is a simple change that adds another enum to the way records are
stored in the function filters. Again, should never be touched unless
you are filtering functions in the function tracer.
> fc13cb0ce452: ftrace: Allow other users of function tracing to use the output listing
This one just moves some code around to replace the global function
filter with a descriptor that other users (like perf and stack tracer)
can filter what functions to trace. Again, this is only used by function
filtering. The dynamic ftrace start up test does test this, maybe the
error was there?
> 06a51d930738: ftrace: Create ftrace_hash_empty() helper routine
This is a simple clean up that also fixes a bug in the way function
filtering works. Again, this should only be touched by function filters.
> c842e975520f: ftrace: Fix ftrace hash record update with notrace
This fixes a bug in the function filter logic. Again, this is caused by
different tracers filtering functions for the function tracer.
> 5855fead9cc3: ftrace: Use bsearch to find record ip
This code isn't even used in x86 (not yet). But it is needed for powerpc
(coming soon).
> 68950619f8c8: ftrace: Sort the mcount records on each page
This sorts the mcount tables used by function tracer.
> 85ae32ae019b: ftrace: Replace record newlist with record page list
> a79008755497: ftrace: Allocate the mcount record pages as groups
These two are clean up code that does affect all kernels when the
function tracer is updated.
> 3208230983a0: ftrace: Remove usage of "freed" records
The "freed" records are affected when modules are unloaded. But an
allyesconfig shouldn't be unloading modules.
> c88fd8634ea6: ftrace: Allow archs to modify code without stop machine
This is a little restructuring of the updates of ftrace calls (nops), so
it is affecting boot up.
> 45959ee7aa64: ftrace: Do not function trace inlined functions
This patch changes the way inlined functions are when
CONFIG_OPTIMIZE_INLINING is set. I wonder if this is the problem patch.
It forces all functions marked as "inline" to be notraced as well. But
perhaps this is affecting the way gcc does stuff. We have had issues
when gcc decides not to inline something.
> 30fb6aa74011: ftrace: Fix unregister ftrace_ops accounting
This may affect the kernel when function tracing is stopped. The
selftests should trigger this.
>
> -tip commit 1bc2a3035df2 appears to be fine, it survived 10
> consecutive reboots.
>
> So i'm a bit stuck - i wanted to pull this and the NMI bits.
> I'll test the NMI bits independently as well, excluding the
> above commits, so that we can at least move forward on that
> topic.
The NMI bits should be on a separate branch. That is, if they are OK,
they should not affect this code. And this code does not require the NMI
code. I left the one patch that converts x86 to non-stop-machine out of
this patch set so that the infrastructure can get in first.
Interesting that these patches cause an issue. Besides the inline one
that I mentioned, the rest should not cause races. They are all set up
and take down of the ftrace function filtering. If there was a bug, it
should crash reliably.
Maybe there's an off-by-one in the sorting. I'll take a look. If there
is, it may corrupt some other data. This is the first time I used the
in-kernel sort and bsearch algorithms. But again, I've been running this
code on my boxes without issues.
I'll add your config and boot my box with it and see how it goes. Note,
I wont be doing much more today, as I'm still on PTO.
Thanks!
-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/