Re: [PATCH 0/2] [git pull] tip updates for 2.6.29

From: Steven Rostedt
Date: Thu Feb 19 2009 - 13:07:33 EST



On Thu, 19 Feb 2009, Ingo Molnar wrote:
> >
> > So, what happens? The dynamic function tracer self test will call
> > the test routine while the tracer is still on. The self test will
> > start consuming all the cpu ring buffers to test them, and will not
> > end until they are all finished. But you also have RCU_TORTURE selected.
> > The RCU torture test will run, filling up the ring buffer on other
> > CPUS. The consumer will never catch up, and we run forever!
>
> Can this problem hit other types of consumers - the
> /debug/tracing/ ones?

No, because the consumer is just a process, and can be preempted. In fact,
this is normal producer/consumer behavior. This was a bug because it was
during self tests. The self test is at boot up and will not continue boot
up processing until it is finished. It's a special case because it is a
"reader" in the boot up code. All other readers are done by user tasks.
The only other kernel reader is ftrace_dump, but it disables all of ftrace
(ring buffers and all) before dumping, and only on crashes anyway.


>
> > Both of these are true bugs that have been in ftrace for a long time.
> > I think they are candidates for getting in 29, even this late in
> > the game. You never know what other config combination can hit these
> > bugs.
> >
> > The fixes are simple. One is to simply disable the ring buffer
> > while the consumer runs. This prevents any producer from keeping
> > the consumer from finishing. The other is to make the function
> > tracer select KALLSYMS.
> >
> > And yes, this was a bitch to debug. This was all I did today :-(
>
> Looks quite subtle indeed.
>
> It might be a safer approach to switch the self-test to
> excercise the actual /debug/tracing paths, instead of having its
> own home-brewn access methods. That way we debug all those
> facilities too - beyond having a self-test - and will avoid bugs
> like this too perhaps.


We could think of ways to redesign the self tests. But for now, they have
helped ups find bugs. Again, I'm not sure if we can change things much,
because the bug that caused this was because the reading was running
during boot up time, and expected to finish. Where as, the /debug/tracing
code is accessed by user tasks that can run for as long as they want.

-- Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/