Bisected rcu hang (kernel/sched.c): was 2.6.33rc4 RCU hang mmspin_lock deadlock(?) after running libvirtd - reproducible.

From: Michael Breuer
Date: Sat Jan 23 2010 - 21:49:47 EST

On 01/13/2010 01:43 PM, Michael Breuer wrote:
[Originally posted as: "Re: 2.6.33RC3 libvirtd ->sky2 & rcu oops (was Sky2 oops - Driver tries to sync DMA memory it has not allocated)"]

On 1/11/2010 8:49 PM, Paul E. McKenney wrote:
On Sun, Jan 10, 2010 at 03:10:03PM -0500, Michael Breuer wrote:
On 1/9/2010 5:21 PM, Michael Breuer wrote:

Attempting to move back to mainline after my recent 2.6.32 issues...
Config is make oldconfig from working 2.6.32 config. Patch for af_packet.c
(for skb issue found in 2.6.32) included. Attaching .config and NMI

System becomes unusable after bringing up the network:

RCU stall warnings are usually due to an infinite loop somewhere in the
kernel. If you are running !CONFIG_PREEMPT, then any infinite loop not
containing some call to schedule will get you a stall warning. If you
are running CONFIG_PREEMPT, then the infinite loop is in some section of
code with preemption disabled (or irqs disabled).

The stall-warning dump will normally finger one or more of the CPUs.
Since you are getting repeated warnings, look at the stacks and see
which of the most-recently-called functions stays the same in successive
stack traces. This information should help you finger the infinite (or
longer than average) loop.
I can now recreate this simply by "service start libvirtd" on an F12 box. My earlier report that suggested this had something to do with the sky2 driver was incorrect. Interestingly, it's always CPU1 whenever I start libvirtd.
Attaching two of the traces (I've got about ten, but they're all pretty much the same). Looks pretty consistent - libvirtd in CPU1 is hung forking. Not sure why yet - perhaps someone who knows this better than I can jump in.
Summary of hang appears to be libvirtd forks - two threads show with same pid deadlocked on a spin_lock
Then if looking at the stack traces doesn't locate the offending loop,
bisection might help.
It would, however it's going to be really difficult as I wasn't able to get this far with rc1 & rc2 :(
Thanx, Paul

I was finally able to bisect this to commit: 3802290628348674985d14914f9bfee7b9084548 (see below)

Libvirtd always triggers the crash; other things that fork and use mmap sometimes do (vsftpd, for example).

Author: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx> 2009-12-16 12:04:37
Committer: Ingo Molnar <mingo@xxxxxxx> 2009-12-16 13:01:56
Parent: e2912009fb7b715728311b0d8fe327a1432b3f79 (sched: Ensure set_task_cpu() is never called on blocked tasks)
Branches: remotes/origin/master
Follows: v2.6.32
Precedes: v2.6.33-rc2

sched: Fix sched_exec() balancing

Since we access ->cpus_allowed without holding rq->lock we need
a retry loop to validate the result, this comes for near free
when we merge sched_migrate_task() into sched_exec() since that
already does the needed check.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
Cc: Mike Galbraith <efault@xxxxxx>
LKML-Reference: <20091216170517.884743662@xxxxxxxxx>
Signed-off-by: Ingo Molnar <mingo@xxxxxxx>

-------------------------------- kernel/sched.c --------------------------------
index 33d7965..63e55ac 100644
@@ -2322,7 +2322,7 @@ void task_oncpu_function_call(struct task_struct *p,
* - fork, @p is stable because it isn't on the tasklist yet
- * - exec, @p is unstable XXX
+ * - exec, @p is unstable, retry loop
* - wake-up, we serialize ->cpus_allowed against TASK_WAKING so
* we should be good.
@@ -3132,21 +3132,36 @@ static void double_rq_unlock(struct rq *rq1, struct rq *rq2)

- * If dest_cpu is allowed for this process, migrate the task to it.
- * This is accomplished by forcing the cpu_allowed mask to only
- * allow dest_cpu, which will force the cpu onto dest_cpu. Then
- * the cpu_allowed mask is restored.
+ * sched_exec - execve() is a valuable balancing opportunity, because at
+ * this point the task has the smallest effective memory and cache footprint.
-static void sched_migrate_task(struct task_struct *p, int dest_cpu)
+void sched_exec(void)
+ struct task_struct *p = current;
struct migration_req req;
+ int dest_cpu, this_cpu;
unsigned long flags;
struct rq *rq;

+ this_cpu = get_cpu();
+ dest_cpu = select_task_rq(p, SD_BALANCE_EXEC, 0);
+ if (dest_cpu == this_cpu) {
+ put_cpu();
+ return;
+ }
rq = task_rq_lock(p, &flags);
+ put_cpu();
+ /*
+ * select_task_rq() can race against ->cpus_allowed
+ */
if (!cpumask_test_cpu(dest_cpu, &p->cpus_allowed)
- || unlikely(!cpu_active(dest_cpu)))
- goto out;
+ || unlikely(!cpu_active(dest_cpu))) {
+ task_rq_unlock(rq, &flags);
+ goto again;
+ }

/* force the process onto the specified CPU */
if (migrate_task(p, dest_cpu, &req)) {
@@ -3161,24 +3176,10 @@ static void sched_migrate_task(struct task_struct *p, int dest_cpu)

task_rq_unlock(rq, &flags);

- * sched_exec - execve() is a valuable balancing opportunity, because at
- * this point the task has the smallest effective memory and cache footprint.
- */
-void sched_exec(void)
- int new_cpu, this_cpu = get_cpu();
- new_cpu = select_task_rq(current, SD_BALANCE_EXEC, 0);
- put_cpu();
- if (new_cpu != this_cpu)
- sched_migrate_task(current, new_cpu);
* pull_task - move a task from a remote runqueue to the local runqueue.
* Both runqueues must be locked.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at