Re: linux-next 20111025: warnings inrcu_idle_exit_common()/rcu_idle_enter_common()

From: Paul E. McKenney
Date: Tue Nov 01 2011 - 12:00:25 EST


On Tue, Nov 01, 2011 at 03:07:21PM +0800, Wu Fengguang wrote:
> On Tue, Nov 01, 2011 at 08:34:34AM +0800, Paul E. McKenney wrote:
> > On Mon, Oct 31, 2011 at 11:44:42AM -0400, Steven Rostedt wrote:
> > > On Mon, 2011-10-31 at 05:19 -0700, Paul E. McKenney wrote:
> > > > On Mon, Oct 31, 2011 at 07:41:42PM +0800, Wu Fengguang wrote:
> > > > > On Mon, Oct 31, 2011 at 06:43:25PM +0800, Wu Fengguang wrote:
> > > > > > On Mon, Oct 31, 2011 at 05:51:52PM +0800, Paul E. McKenney wrote:
> > > > > > > On Mon, Oct 31, 2011 at 04:26:34PM +0800, Wu Fengguang wrote:
> > > > > > > > Hi Paul,
> > > > > > > >
> > > > > > > > I got two warnings in rcutree.c. The last working kernels are
> > > > > > > > linux-next 20111014 and linux v3.1.
> > > > > > >
> > > > > > > Interesting. Could you please enable RCU event tracing at boot?
> > > > > >
> > > > > > Sorry I cannot...possibly due to another ftrace bug.
> > > > > >
> > > > > > > The RCU event tracing is at tracing/events/rcu/enable relative to
> > > > > > > the debugfs mount point at runtime, if that helps.
> > > > > >
> > > > > > It's exactly that linux next 20111025 (comparing to 20111014) no
> > > > > > longer produces all the trace events that made me looking into the
> > > > > > dmesg and find the warning from RCU (rather than the expected warning
> > > > > > from ftrace).
> > > > > >
> > > > > > The trace output is now:
> > > > > >
> > > > > > # tracer: nop
> > > > > > #
> > > > > > # WARNING: FUNCTION TRACING IS CORRUPTED
> > > > > > # MAY BE MISSING FUNCTION EVENTS
> > > > > > # TASK-PID CPU# TIMESTAMP FUNCTION
> > > > > > # | | | | |
> > > > > > (nothing more)
> > > > >
> > > > > I checked the other test box and got the same warnings. Below is the
> > > > > full dmesg.
> > > > >
> > > > > No single trace output again..
> > > >
> > > > Hmmm... I wonder if it is too early during boot for tracing to work
> > > > correctly.
> > > >
> > > > Gah! I have rcu/next set ahead to commits that are not supposed to go
> > > > upstream yet. I reset it back to match the stuff that is targeted for
> > > > the current merge window. Still need to find the bug, of course.
> > > >
> > > > Anyone have any idea why the kworker thread might be trying to enter
> > > > the idle loop? The idle_cpu(smp_processor_id()) call believes that
> > > > this is not the idle task. Or does x86 allow non-idle tasks to enter
> > > > the idle loop? Or to be migrated off-CPU?
> > >
> > >
> > > It's not. Carsten Emde noticed what looked like a bug in ftrace last
> > > week at LinuxCon, and looking deeper at it, I found that the swapper
> > > task for all but CPU0 is named kworker. That's because kworker creates
> > > the idle task for all other CPUs besides CPU 0 and the idle task takes
> > > on kworker name.
> > >
> > > Carsten posted a patch last week too:
> > >
> > > https://lkml.org/lkml/2011/10/26/313
> > >
> > > I'm glad that this bug shows up outside of just ftrace :)
> >
> > That makes one of us. ;-)
> >
> > Fengguang, does Carsten's patch help?
>
> Nope unfortunately. Here is the new dmesg:

Hmmmm... Please see below for a diagnostic patch that prints out who
the kernel believes the idle thread is. Could you please give this
a go?

Thanx, Paul

------------------------------------------------------------------------

rcu: Add more information to the wrong-idle-task complaint

The current code just complains if the current task is not the idle task.
This commit therefore adds printing of the identity of the idle task.

Signed-off-by: Paul E. McKenney <paul.mckenney@xxxxxxxxxx>
Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>

diff --git a/kernel/rcutiny.c b/kernel/rcutiny.c
index e0df33f..f4e7bc3 100644
--- a/kernel/rcutiny.c
+++ b/kernel/rcutiny.c
@@ -66,10 +66,14 @@ static void rcu_idle_enter_common(long long oldval)
}
RCU_TRACE(trace_rcu_dyntick("Start", oldval, rcu_dynticks_nesting));
if (!idle_cpu(smp_processor_id())) {
- WARN_ON_ONCE(1); /* must be idle task! */
+ struct task_struct *idle = idle_task(smp_processor_id());
+
RCU_TRACE(trace_rcu_dyntick("Error on entry: not idle task",
oldval, rcu_dynticks_nesting));
ftrace_dump(DUMP_ALL);
+ WARN_ONCE(1, "Current pid: %d comm: %s / Idle pid: %d comm: %s",
+ current->pid, current->comm,
+ idle->pid, idle->comm); /* must be idle task! */
}
rcu_sched_qs(0); /* implies rcu_bh_qsctr_inc(0) */
}
@@ -116,10 +120,14 @@ static void rcu_idle_exit_common(long long oldval)
}
RCU_TRACE(trace_rcu_dyntick("End", oldval, rcu_dynticks_nesting));
if (!idle_cpu(smp_processor_id())) {
- WARN_ON_ONCE(1); /* must be idle task! */
+ struct task_struct *idle = idle_task(smp_processor_id());
+
RCU_TRACE(trace_rcu_dyntick("Error on exit: not idle task",
oldval, rcu_dynticks_nesting));
ftrace_dump(DUMP_ALL);
+ WARN_ONCE(1, "Current pid: %d comm: %s / Idle pid: %d comm: %s",
+ current->pid, current->comm,
+ idle->pid, idle->comm); /* must be idle task! */
}
}

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index cc04876..2a8d9a6 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -356,10 +356,14 @@ static void rcu_idle_enter_common(struct rcu_dynticks *rdtp, long long oldval)
}
trace_rcu_dyntick("Start", oldval, rdtp->dynticks_nesting);
if (!idle_cpu(smp_processor_id())) {
- WARN_ON_ONCE(1); /* must be idle task! */
+ struct task_struct *idle = idle_task(smp_processor_id());
+
trace_rcu_dyntick("Error on entry: not idle task",
oldval, rdtp->dynticks_nesting);
ftrace_dump(DUMP_ALL);
+ WARN_ONCE(1, "Current pid: %d comm: %s / Idle pid: %d comm: %s",
+ current->pid, current->comm,
+ idle->pid, idle->comm); /* must be idle task! */
}
/* CPUs seeing atomic_inc() must see prior RCU read-side crit sects */
smp_mb__before_atomic_inc(); /* See above. */
@@ -445,10 +449,14 @@ static void rcu_idle_exit_common(struct rcu_dynticks *rdtp, long long oldval)
WARN_ON_ONCE(!(atomic_read(&rdtp->dynticks) & 0x1));
trace_rcu_dyntick("End", oldval, rdtp->dynticks_nesting);
if (!idle_cpu(smp_processor_id())) {
- WARN_ON_ONCE(1); /* must be idle task! */
+ struct task_struct *idle = idle_task(smp_processor_id());
+
trace_rcu_dyntick("Error on exit: not idle task",
oldval, rdtp->dynticks_nesting);
ftrace_dump(DUMP_ALL);
+ WARN_ONCE(1, "Current pid: %d comm: %s / Idle pid: %d comm: %s",
+ current->pid, current->comm,
+ idle->pid, idle->comm); /* must be idle task! */
}
}


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/