[PATCH 0/2] [git pull] tip updates for 2.6.29

From: Steven Rostedt
Date: Wed Feb 18 2009 - 22:38:31 EST



Ingo,

I found the cause of the hard lock up you were seeing. It is one
of those cases where a new patch does not create a bug, but unveils
one. The change that showed the bug was:

e68746a: ftrace: enable filtering only when a function is filtered on

The bug was there all along, but his change revealed it. There were
two bugs actually.

1) The function tracer is useless without KALLSYMS. Without KALLSYMS
you will only get hex values for your funtion traces.
This also totally breaks the dynamic function tracer. It depends
on having names to compare to select functions.

2) In the self test, there is a while loop that consumes the buffer
and will not end until the buffer is empty. If we still have a
producer present, this becomes an infinite loop.

The above two bugs are needed for the lock up, as well as the
mentioned patch. Without the patch, the function filter is activated
whenever we pass in a filter, even if we do not select any function.
The patch changes that to only activate the filter if we succeed in
selecting a function.

Back to the bugs.

Without KALLSYMS, we never select a function, but we still activate
the filter. This causes all functions to be disabled from tracing.
The dynamic ftrace self test fails because it never sees the selected
function get traced.

With the patch and without KALLSYMS selected, we now do not activate
the filter, because no function was selected (all compares of a given
name to a NULL pointer will fail). Now all functions are still enabled
to be traced.

So, what happens? The dynamic function tracer self test will call
the test routine while the tracer is still on. The self test will
start consuming all the cpu ring buffers to test them, and will not
end until they are all finished. But you also have RCU_TORTURE selected.
The RCU torture test will run, filling up the ring buffer on other
CPUS. The consumer will never catch up, and we run forever!

Both of these are true bugs that have been in ftrace for a long time.
I think they are candidates for getting in 29, even this late in
the game. You never know what other config combination can hit these
bugs.

The fixes are simple. One is to simply disable the ring buffer
while the consumer runs. This prevents any producer from keeping
the consumer from finishing. The other is to make the function
tracer select KALLSYMS.

And yes, this was a bitch to debug. This was all I did today :-(

Please pull the latest tip/tracing/urgent tree, which can be found at:

git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace.git
tip/tracing/urgent


Steven Rostedt (2):
tracing: disable tracing while testing ring buffer
tracing: have function trace select kallsyms

----
kernel/trace/Kconfig | 2 ++
kernel/trace/trace_selftest.c | 9 +++++++++
2 files changed, 11 insertions(+), 0 deletions(-)
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/