Re: PROBLEM: Consistent lock up on >=2.6.8

From: Andrew Morton
Date: Mon Oct 04 2004 - 03:39:43 EST


Jason Stubbs <jstubbs@xxxxxxxxxxxxx> wrote:
>
> I'm getting consistent hangs under heavy load on 2.6.8.1, 2.6.9-rc2 and
> 2.6.9-rc3. I have tested on 2.6.7 and have no problems at all. I can
> reproduce the problem in less than 10 minutes by repeatedly running a cpu
> intensive process such as openssl while the system performs its regular
> tasks.
>
> SysRq works and I was able to pull the registers (which are always very
> similar) and the last 50 lines of the task list. If more of the task list is
> necessary, I'll go out and buy a serial cable to get it. All information
> provided is from 2.6.9-rc3.
>
> ...
>
> Pid: 22433, comm: openssl
> EIP: 0060:[<c02d5082>] CPU: 0
> EIP is at _spin_lock+0xa/0x13
> EFLAGS: 00000286 Not tainted (2.6.9-rc3)
> EAX: c03b2500 EBX: 00000000 ECX: x031ca80 EDX: 00000000
> ESI: f8b9c926 EDI: ffffffff EBP: c03e1f44 DS: 007b ES: 007b
> CR0: 8005003b CR2: b7df91df CR3: 3705c000 CR4: 000006d0

Looks like a locking bug.

> Stack pointer is garbage, not printing trace

Sigh. That's exactly the info we wanted. Please try 2.6.9-rc3-mm1 which
should have that fixed. Or wait 15 minutes for 2.6.9-rc3-mm2.

Then do the sysrq-P trace again. You may need to type it a few times. If
you manage to get a backtrace of the process which is stuck in spin_lock,
that's the info we want.

You may find that all CPUs are stuck in spin_lock(). If so, please send
each CPU's backtrace. If you have to type it all in, don't worry about the
eight-digit hex numbers.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/