Re: [FIXED] Re: Total machine lockup w/ current kernels whileinstalling from CD

From: Andrew Morton
Date: Mon May 15 2006 - 16:44:54 EST


Bernhard Rosenkraenzer <bero@xxxxxxxxxxxx> wrote:
>
> On Thursday, 11. May 2006 03:22, Bernhard Rosenkraenzer wrote:
> > Hi,
> > I've built a CD that installs a customized system
> > [... crash at a random point ...]
> > BUG: soft lockup detected on CPU#0!
> >
> > Pid: 421, comm: kjournald
> > EIP: 0060:[<b01a2f52>] CPU: 0
> > EIP is at journal_commit_transaction+0x92e/0xfcc
> > EFLAGS: 00000297 Not tainted (2.6.16-rc6 #1)
> > EAX: 00000001 EBX: c2d34788 ECX: 00000001 EDX: c785e000
> > ESI: b3ff8d04 EDI: 000000f0 EBP: b683b840 DS: 007b ES: 007b
> > CR0: 8005003b CR2: 0841f7fc CR3: 17217000 CR4: 000006d0
> > [<b02bd52e>] schedule+0x2ee/0x5b6
> > [<b01a6a88>] kjournald+0x201/0x213
> > [<b0111089>] smp_apic_timer_interrupt+0x32/0x49
> > [<b01a6937>] kjournald+0xb0/0x213
> > [<b01a5ffa>] commit_timeout+0x0/0x9
> > [<b012a789>] autoremove_wake_function+0x0/0x4b
> > [<b01a6887>] kjournald+0x0/0x213
> > [<b0101005>] kernel_thread_helper+0x5/0xb
>
> After backing out lots of changes, I've figured out the problem is caused by
> this bit of 2.6.16-rc6:
>
> diff -urN linux-2.6.16-rc5/kernel/sched.c linux-2.6.16-rc6/kernel/sched.c
> --- linux-2.6.16-rc5/kernel/sched.c 2006-05-11 20:04:18.000000000 +0200
> +++ linux-2.6.16-rc6/kernel/sched.c 2006-05-11 20:00:00.000000000 +0200
> @@ -4028,6 +4021,8 @@
> */
> if (unlikely(preempt_count()))
> return;
> + if (unlikely(system_state != SYSTEM_RUNNING))
> + return;
> do {
> add_preempt_count(PREEMPT_ACTIVE);
> schedule();

(That's cond_resched())

>
> The problem is that (to save a couple of bits of space), my simple installer
> was running inside an initrd -- and system_state isn't set to SYSTEM_RUNNING
> before linuxrc is executed --> scheduler breakage causes the oops.

ah-hah.

It's odd that we'll run initrds in a !SYSTEM_RUNNING state.

It's not an oops - it's sort-of a warning. Did the system actually
continue to run and boot up OK?

If so, I'd assume that the ext3 filesystem was mounted on a very slow
device - perhaps an IDE disk in PIO mode?

Perhaps we should poke the softlockup detector if someone called
cond_resched() when in a reschedulable state.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/