Re: 2.6.21-rc2-mm2 hang

From: Andrew Morton
Date: Wed Mar 07 2007 - 17:55:44 EST


On Wed, 07 Mar 2007 14:12:16 -0800
Dave Hansen <hansendc@xxxxxxxxxx> wrote:

> I'm seeing weird hangs running ltp on 2.6.21-rc2-mm2. It manifests
> itself by the waitpid06 test in LTP hanging. This is very, very
> reproducible in about 5 seconds by adding '-s wait' to the ltp command
> line.
>
> I see 4 waitpid06 processes on my 4-way machine spinning in userspace.
> But, the weird part is that I can't ssh in once this happens, but I can
> log in to the console. I've bisected it down to:
>
> sched-fix-idle-load-balancing-in-softirqd-context
>
> http://sr71.net/~dave/linux/elm3b82-config
>
> sysrq-t:
>
> bash S 00000001 0 1670 1667 1676 (NOTLB)
> f761ff18 00000086 00000000 00000001 f7afb3ac c16f5f6c f761ff1c c0146eb5
> e93d3067 fffb8000 f761ff20 c2d922e0 f7ec4030 f7ec413c 00006db1 1457ab10
> 00000025 c038eb60 fffffe00 f7ec4030 f7ec40e4 f761ff88 c011bc38 c014800d
> Call Trace:
> [<c011bc38>] do_wait+0x271/0x316
> [<c011bd86>] sys_wait4+0x2f/0x31
> [<c011bdaf>] sys_waitpid+0x27/0x2c
> [<c010265c>] syscall_call+0x7/0xb
> =======================
> runltp S 00000001 0 1676 1670 1780 (NOTLB)
> f7625f18 00000082 00000000 00000001 f7adf3ac c16f5bec f7625f1c c0146eb5
> e8cf4067 fffab000 f7625f20 c2d9a2e0 c30ee050 c30ee15c 00007d8f 3c8acf22
> 00000025 c309e560 fffffe00 c30ee050 c30ee104 f7625f88 c011bc38 c014800d
> Call Trace:
> [<c011bc38>] do_wait+0x271/0x316
> [<c011bd86>] sys_wait4+0x2f/0x31
> [<c011bdaf>] sys_waitpid+0x27/0x2c
> [<c010265c>] syscall_call+0x7/0xb
> =======================
> pan S 00000000 0 1780 1676 1831 (NOTLB)
> f7639f30 00000086 00000000 00000000 00000000 00000246 f7ad095c f7f0e760
> 00000000 f7639f0c c0162748 c2d922e0 f7a0aab0 f7a0abbc 0000edaf abaf1d13
> 00000026 c038eb60 fffffe00 f7a0aab0 f7a0ab64 f7639fa0 c011bc38 e97e7065
> Call Trace:
> [<c011bc38>] do_wait+0x271/0x316
> [<c011bd86>] sys_wait4+0x2f/0x31
> [<c010265c>] syscall_call+0x7/0xb
> =======================
> waitpid06 S 00000001 0 1831 1780 1832 (NOTLB)
> f7637f18 00000082 00000000 00000001 f769b134 c16ed36c f7637f1c c0146eb5
> f7a38c80 00000000 f7bce030 c2d9a2e0 f7e78030 f7e7813c 00000f64 abdcd3a1
> 00000026 f7bce030 fffffe00 f7e78030 f7e780e4 f7637f88 c011bc38 c014800d
> Call Trace:
> [<c011bc38>] do_wait+0x271/0x316
> [<c011bd86>] sys_wait4+0x2f/0x31
> [<c011bdaf>] sys_waitpid+0x27/0x2c
> [<c010265c>] syscall_call+0x7/0xb
> =======================
>

Michal has reported a similar thing in recent -mm's.
sched-fix-idle-load-balancing-in-softirqd-context was added in
2.6.21-rc2-mm1.

I'll drop sched-fix-idle-load-balancing-in-softirqd-context.patch and
sched-fix-idle-load-balancing-in-softirqd-context-fix.patch and, as a
consequence, sched-dynticks-idle-load-balancing-v3.patch.

Thanks for doing the bisect - it really helps.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/