Re: PROBLEM: 3.0-rc kernels unbootable since -rc3

From: Paul E. McKenney
Date: Tue Jul 12 2011 - 06:55:17 EST


On Mon, Jul 11, 2011 at 05:09:54PM -0400, Konrad Rzeszutek Wilk wrote:
> On Mon, Jul 11, 2011 at 01:15:08PM -0700, Paul E. McKenney wrote:
> > On Mon, Jul 11, 2011 at 03:30:22PM -0400, Konrad Rzeszutek Wilk wrote:
> > > >
> > > > Hmmm... Does the stall repeat about every 3.5 minutes after the first stall?
> > >
> > > Starting Configure read-only root support...
> > > [ 81.335070] INFO: rcu_sched_state detected stalls on CPUs/tasks: { 0} (detected by 3, t=60002 jiffies)
> > > [ 81.335091] sending NMI to all CPUs:
> > > [ 261.367071] INFO: rcu_sched_state detected stalls on CPUs/tasks: { 0} (detected by 3, t=240034 jiffies)
> > > [ 261.367092] sending NMI to all CPUs:
> > > [ 441.399066] INFO: rcu_sched_state detected stalls on CPUs/tasks: { 0} (detected by 3, t=420066 jiffies)
> > > [ 441.399089] sending NMI to all CPUs:
> >
> > OK, then the likely cause is something hanging onto the CPU. Do the later
> > stalls also show stack traces? If so, what shows up?
>
> I don't really get any stack traces from the guest. Not sure why it does
> not print them out (probably b/c the NMI functionality is not accessible
> somehow?). I get the stack traces using a 'xenctx' tool and this is what
> I get from the guest before the stall, and after the stall:
>
> 20:45:56 # 12 :/mnt/tmp/FC15-32/
> /usr/lib64/xen/bin/xenctx 29 -s System.map-3.0.0-rc6-disabled-options+ -a 2
> cs:eip: 0061:c042d0f5 task_waking_fair+0x14
> flags: 00001286 i s nz p
> ss:esp: 0069:e94cff0c
> eax: c18dbed0 ebx: ffffffff ecx: fff00000 edx: c14a10c0
> esi: 00000000 edi: 00000000 ebp: e94cff18
> ds: 007b es: 007b fs: 00d8 gs: 00e0
>
> cr0: 8005003b
> cr2: b7743000
> cr3: 97348001
> cr4: 00000660
>
> dr0: 00000000
> dr1: 00000000
> dr2: 00000000
> dr3: 00000000
> dr6: ffff0ff0
> dr7: 00000400
> Code (instr addr c042d0f5)
> c3 55 89 e5 57 56 53 3e 8d 74 26 00 8b 90 58 01 00 00 8b 7a 1c <8b> 72 20 8b 5a 18 8b 4a 14 39 f3
>
>
> Stack:
> c18dbed0 00000003 00000002 e94cff38 c0439a45 c18d00c0 c18dc2c0 00000000
> e8bd1ec4 e8bd1ef8 00000003 e94cff40 c0439b0c e94cff64 c042d4db 00000000
> e8bd1f04 00000001 00000001 e8bd1f00 e8bd0200 e8bd1efc e94cff80 c042ea69
> 00000000 00000000 e8bd1ef4 ea9c4918 c0a43a80 e94cff88 c0455e14 e94cffb4
>
> Call Trace:
> [<c042d0f5>] task_waking_fair+0x14 <--

Hmmm... This is a 32-bit system, isn't it?

Could you please add a check to the loop in task_waking_fair() and
do a printk() if the loop does (say) more than 1000 passes without
exiting?

Thanx, Paul

> [<c0439a45>] try_to_wake_up+0xb2
> [<c0439b0c>] default_wake_function+0x10
> [<c042d4db>] __wake_up_common+0x3b
> [<c042ea69>] complete+0x3e
> [<c0455e14>] wakeme_after_rcu+0x10
> [<c048fd26>] __rcu_process_callbacks+0x172
> [<c048fe14>] rcu_process_callbacks+0x1e
> [<c044567d>] __do_softirq+0xa2
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/