Oops in 2.1.131-pre3, was: Re: [2.1.130] High load average appearing from the shadows.

Jochen Heuer (jogi@planetzork.ping.de)
Wed, 2 Dec 1998 12:33:29 +0100


On Tue, Dec 01, 1998 at 11:35:04PM +0100, Jochen Heuer wrote:

I hate to follow up on my own posts but I found out a little more.

> [jogi@planetzork jogi]$ ps axl | grep -E ' D '
> 0 500 3179 1 0 0 1728 896 down_failed D ? 0:00 cvs -z9 c
> 0 500 5724 1 0 0 796 392 down_failed D ? 0:00 du -s . .
> 0 500 17364 1 0 0 1728 896 down_failed D p0 0:00 cvs -z9 c
> 0 500 18034 1 0 0 1728 896 down_failed D ? 0:00 cvs -z9 c
> 0 500 28445 1 0 0 1728 896 down_failed D ? 0:00 cvs -z9 c
>
> My current load is
>
> [jogi@planetzork jogi]$ cat /proc/loadavg
> 7.00 7.01 6.74 3/92 18550
>
> which is caused by 2 rc5des taking all idle time and those 5 stuck
> processes. There were some changes in patch-2.1.130 regarding
> down_failed which I don't understand. Maybe this helps tracking down
> the problem.

Running kernel-2.1.131-pre3 I got another stuck process:

[jogi@planetzork jogi]$ ps alwx | grep -E ' D ' | grep -v grep
0 500 1180 1 0 0 832 372 down_failed D ? 0:00 find

The find was running on the same directory which caused the above cvs command
to get stuck. But with 2.1.131-pre3 I get an oops in dmesg:

Unable to handle kernel NULL pointer dereference at virtual address 00000008
current->tss.cr3 = 050a2000, %cr3 = 050a2000
*pde = 00000000
Oops: 0000
CPU: 0
EIP: 0010:[<c0172b26>]
EFLAGS: 00010212
eax: 04000000 ebx: c880e610 ecx: 00000010 edx: 00000002
esi: 00000000 edi: 20000000 ebp: 00000008 esp: c5afde60
ds: 0018 es: 0018 ss: 0018
Process find (pid: 1167, process nr: 87, stackpage=c5afd000)
Stack: c4f36f2e 00000009 c013c28d c880c000 40000000 c880a000 00000008 00000008
00000001 c01712fa c01fda2c c4f36f2e c4f36f30 00000002 00000000 c5afdf48
00000001 c016fb85 00000001 c4f36f2e c4f36f30 00000002 00000008 00000001
Call Trace: [<c013c28d>] [<c880c000>] [<c880a000>] [<c01712fa>] [<c016fb85>] [<c013a554>] [<c012d32c>]
[<c012f3b9>] [<c012f240>] [<c013a368>] [<c0107b00>]
Code: 8b 46 08 03 06 39 c7 7c 27 8b 5b 04 85 db 75 1e 57 68 3e 4c

[root@planetzork /root]# cat oops | ksymoops /boot/System.map
Using `/boot/System.map' to map addresses to symbols.

>>EIP: c0172b26 <raid0_map+a6/13c>
Trace: c013c28d <inode_getblk+45/1d4>
Trace: c880c000
Trace: c880a000
Trace: c01712fa <md_map+5e/68>
Trace: c016fb85 <ll_rw_block+e9/200>
Trace: c013a554 <ext2_readdir+1ec/584>
Trace: c012d32c <do_follow_link+78/84>
Trace: c012f3b9 <sys_getdents+f5/18c>
Trace: c012f240 <filldir>
Trace: c013a554 <ext2_readdir+1ec/584>
Trace: c0107b00 <system_call+34/38>
Code: c0172b26 <raid0_map+a6/13c>
Code: c0172b26 <raid0_map+a6/13c> 8b 46 08 movl 0x8(%esi),%eax
Code: c0172b29 <raid0_map+a9/13c> 03 06 addl (%esi),%eax
Code: c0172b2b <raid0_map+ab/13c> 39 c7 cmpl %eax,%edi
Code: c0172b2d <raid0_map+ad/13c> 7c 27 jl c0172b56 <raid0_map+d6/13c>
Code: c0172b2f <raid0_map+af/13c> 8b 5b 04 movl 0x4(%ebx),%ebx
Code: c0172b32 <raid0_map+b2/13c> 85 db testl %ebx,%ebx
Code: c0172b34 <raid0_map+b4/13c> 75 1e jne c0172b54 <raid0_map+d4/13c>
Code: c0172b36 <raid0_map+b6/13c> 57 pushl %edi
Code: c0172b37 <raid0_map+b7/13c> 68 3e 4c 00 90 pushl $0x90004c3e
Code: c0172b3c <raid0_map+bc/13c> 90 nop
Code: c0172b3d <raid0_map+bd/13c> 90 nop

System is plain 2.1.131-pre3 SMP compiled with gcc-2.7.2.3.

I hope this helps to squeeze this bug.

Regards,

Jogi

PS: If you need further informations or some patches tested, please let
me know.

-- 

Well, yeah ... I suppose there's no point in getting greedy, is there?

<< Calvin & Hobbes >>

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/