Re: rcu-torture boot hang

From: Steven Rostedt
Date: Wed Jun 13 2012 - 22:55:44 EST


Just a note. Please use my goodmis email and not my Red Hat email. My
Red Hat email is not checked as often, and I don't usually tag emails
there as "reply to". I only author with that email to give credit to the
one that pays me to do the work (and I don't mean just for the LWN
stats).

On Wed, Jun 13, 2012 at 08:49:00PM +0800, Fengguang Wu wrote:
> Hi Steven,
>
> On Tue, Jun 12, 2012 at 08:11:22PM +0800, Fengguang Wu wrote:
> > Hi Paul,
> >
> > In kernel boot tests with the attached config, I find the 3.5-rc2+
> > kernels all hang here:
> >
> > [ 6.522546] rcu-torture:--- Start of test: nreaders=2
> > nfakewriters=4 stat_interval=0 verbose=0 test_no_idle_hz=0
> > shuffle_interval=3 stutter=5 irqreader=1 fqs_duration=0 fqs_holdoff=0
> > fqs_stutter=3 test_boost=1/0 test_boost_interval=7
> > test_boost_duration=4 shutdown_secs=0 onoff_interval=0 onoff_holdoff=0
>
> It turns out that commit 5963e317b1e9d2a4511503916d8fd664bb8fa8fb
> ("ftrace/x86: Do not change stacks in DEBUG when calling lockdep") is
> the root cause of this boot hang. The commit is reported by git bisect
> and confirmed to fix the boot hang by reverting in on 3.5-rc2.

I get the hang with your config too. If I disable both LOCKDEP and
IRQSOFF_TRACER which turns off TRACE_IRQFLAGS then the lockup goes away.
This is consistent with your findings, as the patch you found is only
enabled when TRACE_IRQFLAGS is on.

>
> [ rcu torture tests ]
>
> ...It would hang here before reverting the commit...
>
> [ 1.611901] Testing tracer function: PASSED

I did a bit more investigation and found that the problem comes from the
load_idt() command that is used to avoid reseting the debug stack. When
the function tracer is enabled it adds breakpoints to all locations that
it is about to trace in order to convert the nops in the functions into
calls to the tracer. But if the breakpoint handler is traced, the
breakpoint in the handler will reset the stack and cause a crash.

Your above observation is correct. Actually I added a printk in my
testing and found that the 'Testing trace function:' does output before
the lockup. I think printk changed recently that prevents it to print
out right away if a '\n' is not supplied. The function tracer self test
prints:

'Testing tracer function: '

Runs the test, and when it succeeds it prints 'PASSED\n'. Because the
first part wasn't printed, we never saw that the function tracer was
being tested when the lockup occurred.

This is a major flaw with the new printk(). It hides what is being
tested. I need to write a patch to create a fflush() for printk.

Anyway, load_idt() shouldn't be traced, but with PARAVIRT_GUEST it makes
functions that are even always_inlined not inlined, and allows things
like native_load_idt() to be traced, and that causes bad things as we
want to call load_idt to prevent recursion of the int3 handler.


If it works, can you give me your 'Tested-by'.

Thanks!

-- Steve

diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
index e5834aa..6a6d7ae 100644
--- a/include/linux/compiler-gcc.h
+++ b/include/linux/compiler-gcc.h
@@ -47,9 +47,9 @@
*/
#if !defined(CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING) || \
!defined(CONFIG_OPTIMIZE_INLINING) || (__GNUC__ < 4)
-# define inline inline __attribute__((always_inline))
-# define __inline__ __inline__ __attribute__((always_inline))
-# define __inline __inline __attribute__((always_inline))
+# define inline inline __attribute__((always_inline)) notrace
+# define __inline__ __inline__ __attribute__((always_inline)) notrace
+# define __inline __inline __attribute__((always_inline)) notrace
#else
/* A lot of inline functions can cause havoc with function tracing */
# define inline inline notrace
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/