linux-next: boot failure for next-20111216

From: Stephen Rothwell
Date: Wed Dec 21 2011 - 19:21:24 EST


Hi all,

next-20111216 (and all later so far) fails to boot on a Power7 blade
server like this:

calling .serial8250_init+0x0/0x1e0 @ 1
Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
INFO: rcu_sched detected stalls on CPUs/tasks: { 3 4 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31} (detected by 5, t=6002 jiffies)
Call Trace:
[c0000003fd9e74e0] [c000000000015054] .show_stack+0x74/0x1c0 (unreliable)
[c0000003fd9e7590] [c00000000012ad2c] .__rcu_pending+0x49c/0x4b0
[c0000003fd9e7650] [c00000000012ae94] .rcu_check_callbacks+0x154/0x250
[c0000003fd9e76f0] [c0000000000b8ab4] .update_process_times+0x44/0xa0
[c0000003fd9e7780] [c0000000000fcdbc] .tick_sched_timer+0x7c/0x100
[c0000003fd9e7820] [c0000000000d5b38] .__run_hrtimer+0xb8/0x270
[c0000003fd9e78d0] [c0000000000d6068] .hrtimer_interrupt+0x138/0x2d0
[c0000003fd9e79e0] [c00000000001e4d0] .timer_interrupt+0x130/0x300
[c0000003fd9e7a90] [c0000000000038e4] decrementer_common+0x164/0x180
--- Exception: 901 at .arch_local_irq_restore+0xd8/0x140
LR = .cpu_idle+0x180/0x220
[c0000003fd9e7d80] [c000000000c7c368] wireless_seq_fops+0x109a8/0x2f418 (unreliable)
[c0000003fd9e7e20] [c000000000017970] .cpu_idle+0x180/0x220
[c0000003fd9e7ed0] [c0000000008369e8] .start_secondary+0x3a4/0x3b0
[c0000003fd9e7f90] [c0000000000093f4] .start_secondary_prolog+0x10/0x14

There are quite a few more rcu stalls after this and the boot never
completes (at least not before our automated test system steps in and
reboots the server).

So the obvious first place to point the finger would be changes to the
8250 code ... but there is nothing too obvious there.

--
Cheers,
Stephen Rothwell sfr@xxxxxxxxxxxxxxxx
http://www.canb.auug.org.au/~sfr/

Attachment: pgp00000.pgp
Description: PGP signature