RE: mysterious 2.0.33 crashes

Ken Jordan (kenjordan@massmedia.com)
Tue, 17 Feb 1998 01:38:06 -0800 (PST)


On Mon, 16 Feb 1998, Alfredo Sanjuan wrote:

>
> >FWIW (as I already told in another mail): I'm having the same type
> >of problems on a system that happened to run for years without any
> >unexpected crashes with 1.2.13. After upgrading to 2.0.32/33 (and
> >with a complete new distribution, not based on glibc), the system
> >mysteriously halts 1 or 2 times a week without _any_ message.
>
>
> With kernel 2.0.29 my machine got an uptime of 80 days with no problems at all.
> Since I upgrade to
> 2.0.3[0,1,2,3] I'm getting several Oops every day, several processes in D state,
> several zombies... :-(
>
> Until 2.0.29 I can run the rc5 crack clients, since 2.0.30 I can't run it
> because it freezes the system in less than 10 minutes.
>
> I'm thinking to downgrade to 2.0.29...

I have to chime in also, 2.0.33 seems like it has some bad problem(s). I
have on average a mysterious crash, panic or oops, "D" state in net apps
every day or so. At first I though it might be hardware related so I
switched from a 486-133 to a Pentium-133 but the same types of problems
persist (and I have ran diagnostics on both systems and can find no real
hardware problem). These same systems ran 2.0.29 with unbounded uptimes
(I had been using the 486-133 for well over a year as a 24/7 server).

Here are a few examples from the logs of last few weeks (of the ones that
leave messages - I have had several that take out ext2 fs also):

kernel: general protection: 0000
kernel: CPU: 0
kernel: EIP: 0010:[tcp_recvmsg+822/1036]
kernel: EFLAGS: 00010246
kernel: eax: 00f6c4d4 ebx: 0336c2c4 ecx: 00f6c438 edx: 80f6c4d4
kernel: esi: 00000063 edi: 01229f78 ebp: 00f6c414 esp: 01229ee8
kernel: ds: 0018 es: 0018 fs: 002b gs: 002b ss: 0018
kernel: Process irc (pid: 432, process nr: 50, stackpage=01229000)
kernel: Stack: 00f6c414 01229f7c 00000000 00000000 00000000 00f6c438 00000000 00000063
kernel: 0110ac0c 00978348 931c93f4 0014e9da 00f6c414 01229f78 0000019d 00000000
kernel: 00000000 01229f7c 00000200 00978300 08097998 00978390 001359c7 00978390
kernel: Call Trace: [inet_recvmsg+114/136] [sock_read+171/192]
[sys_read+192/232] [system_call+85/124]
kernel: Code: 89 42 04 89 10 6a 01 53 89 4c 24 1c e8 b9 3d ff ff 83 c4 08

kernel: Unable to handle kernel paging request at virtual address c51b62fc
kernel: current->tss.cr3 = 01f50000, <r3 = 01f50000
kernel: *pde = 00000000
kernel: Oops: 0000
kernel: CPU: 0
kernel: EIP: 0010:[refile_buffer+383/800]
kernel: EFLAGS: 00010213
kernel: eax: 01426b98 ebx: 013f2098 ecx: 00000558 edx: 013f2098
kernel: esi: 00000003 edi: 013f2000 ebp: 0011d65b esp: 01b44e14
kernel: ds: 0018 es: 0018 fs: 002b gs: 002b ss: 0018
kernel: Process ncftp (pid: 1935, process nr: 48, stackpage=01b44000)
kernel: Stack: 013f2098 013f1c00 013f2000 00158fd5 013f2098 00000000 0011d65b 00000001
kernel: 00be7300 00000400 01426b98 00000400 001595a4 00be7300 0011d65b 01b44ef4
kernel: 00000008 000000d7 00000001 00be7300 00000002 0142175c 0015982d 00be7300
kernel: Call Trace: [ext2_alloc_block+273/412] [generic_file_mmap+23/180]
[block_getblk+348/612] [generic_file_mmap+23/180] [ext2_gekernel:
[sock_read+171/192] [sys_write+331/388] [system_call+85/124]
kernel: Code: 39 1c 95 9c e0 1e 00 75 21 8b 43 18 89 04 95 9c e0 1e 00 8b

kernel: general protection: 0000
kernel: CPU: 0
kernel: EIP: 0010:[do_munmap+220/1304]
kernel: EFLAGS: 00010202
kernel: eax: 08086000 ebx: 00391f18 ecx: 40005000 edx: 437b15d9
kernel: esi: 00391598 edi: 08086000 ebp: 01ff6b98 esp: 03b91e84
kernel: ds: 0018 es: 0018 fs: 002b gs: 002b ss: 0018
kernel: Process make (pid: 8178, process nr: 67, stackpage=03b91000)
kernel: Stack: 00391f18 00000012 00001000 01ff6b98 00811c00 00391598 00000246 00000006
kernel: 0000000a 00000018 00002520 00a07808 00193ecc 00000006 00000005 00a58000
kernel: 00001000 00000000 00000005 0000000a 0015a50d 0015a540 0334dc00 002211a0
kernel: Call Trace: [con_write+4876/4908] [ext2_lookup+129/368]
[ext2_lookup+180/368] [tty_default_put_char+30/40] [opost+440/456]
[kernel: [do_mmap+594/868] [sys_brk+341/364] [system_call+85/124]
kernel: Code: 8b 42 08 39 c1 74 1d 73 0f 89 94 24 d4 00 00 00 8b 52 14 eb

kernel: general protection: 0000
kernel: CPU: 0
kernel: EIP: 0010:[fldenv+681/756]
kernel: EFLAGS: 00010006
kernel: eax: 0022013c ebx: 002101f0 ecx: 00000100 edx: 002101f0
kernel: esi: 00220148 edi: f000e8a6 ebp: 002101f0 esp: 00b88f10
kernel: ds: 0018 es: 0018 fs: 002b gs: 002b ss: 0018
kernel: Process rc5des (pid: 454, process nr: 46, stackpage=00b88000)
kernel: Stack: 00000000 00000001 00000000 00213f00 00000001 00000000 00180cf2 0000222c
kernel: 001818a9 00220148 f000e8a6 00000080 00002218 00220148 0018182c 00213f00
kernel: 00182732 00220148 0009e1d8 20000000 0000000e 0010c95a 0000000e 00002218
kernel: Call Trace: [fstenv+74/536] [L_more_than_31+19/22]
[reg_u_sub+20/34] [hardreg_to_softreg+78/216] [setup_x86_irq+110/292]
[IRkernel: [do_signal+503/632]
kernel: Code: f3 66 6d 5b 5e 5f 5d 83 c4 10 c3 83 ec 08 55 57 56 53 8b 7c
kernel: Aiee, killing interrupt handler

kernel: general protection: 0000
kernel: CPU: 0
kernel: EIP: 0010:[filemap_nopage+575/736]
kernel: EFLAGS: 00010286
kernel: eax: 00000002 ebx: f000e73c ecx: c0007c0c edx: 00000001
kernel: esi: 00000000 edi: 01f3affc ebp: 00000000 esp: 02738e98
kernel: ds: 0018 es: 0018 fs: 002b gs: 002b ss: 0018
kernel: Process crond (pid: 8966, process nr: 72, stackpage=02738000)
kernel: Stack: 01f3ae18 00000073 00001000 02166498 00000000 01f3affc 00000003 02738ec4
kernel: 01f3ae18 021664e8 01f3a430 01f3ab6c 01f3acf0 01f3a1ac 01f3ab6c 01f3acf0
kernel: 0281e005 0015a50d 0015a540 032b1700 0015a56a 032b1700 032b1700 00000001
kernel: Call Trace: [reset_fdc_info+1/144] [reset_fdc_info+52/144]
[reset_fdc_info+94/144] [make_request+743/1020] [refill_freelist+kernel:
[merge_segments+1121/1224] [__do_down+95/188] [do_signal+613/632]
kernel: Code: 0f bf 69 12 eb 05 8d 76 00 31 ed 85 db 74 0a 0f bf 7b 12 89

kernel: Unable to handle kernel paging request at virtual address c89ea64c
kernel: current->tss.cr3 = 01efe000, <r3 = 01efe000
kernel: *pde = 00000000
kernel: Oops: 0000
kernel: CPU: 0
kernel: EIP: 0010:[free_wait+40/68]
kernel: EFLAGS: 00010006
kernel: eax: 089ea648 ebx: 010fd048 ecx: 010fd03c edx: 089ea648
kernel: esi: 00000207 edi: 03d8ce9c ebp: 00000000 esp: 03d8ce74
kernel: ds: 0018 es: 0018 fs: 002b gs: 002b ss: 0018
kernel: Process netscape (pid: 10539, process nr: 68, stackpage=03d8c000)
kernel: Stack: 00000020 03ad5c7c 00000000 0012ce4e 03d8ce9c 00000020 00000000 bfffe0c8
kernel: bfffe6c8 010fd000 00000006 010fd000 0012d0a7 00000020 03d8cf54 03d8cf14
kernel: 03d8ced4 03d8cf74 03d8cf34 03d8cef4 bfffe6c8 00000020 bfffe064 bfffe05c
kernel: Call Trace: [do_select+414/484] [sys_select+387/596]
[sock_write+158/180] [unix_ioctl+135/156] [old_select+63/80]
[system_cakernel: Code: 8b 42 04 39 d8 74 05 89 c2 eb f5 90 89 4a 04 56 9d 8b 0f 85

kernel: invalid operand: 0000
kernel: CPU: 0
kernel: EIP: 0010:[<00d91477>]
kernel: EFLAGS: 00010046
kernel: eax: 00d91477 ebx: 001d9e8c ecx: 00000000 edx: 00000000
kernel: esi: 00000000 edi: 00000000 ebp: 03dddefc esp: 03ddded0
kernel: ds: 0018 es: 0018 fs: 002b gs: 002b ss: 0018
kernel: Process mv (pid: 20060, process nr: 68, stackpage=03ddd000)
kernel: Stack: 0010c90d 00000000 00000000 03dddefc 03dddefc 00149498 00000001 03dddf50
kernel: 0010b618 00000000 03dddefc 03f38018 00000000 001de1f4 00149498 00000001
kernel: 03dddf50 03f381b8 00000018 00000018 0000002b 0000002b fffffffe 0011243f
kernel: Call Trace: [do_IRQ+45/80] [tcp_retransmit_timer+0/224]
[fast_IRQ0_interrupt+88/128] [tcp_retransmit_timer+0/224]
[timer_bh+kernel: [sys_read+114/232] [system_call+85/124]
kernel: Code: f0 f0 f0 f0 f0 f0 f0 f0 f0 f0 f0 f0 f0 f0 f0 f0 f0 f0 f0 f0
kernel: Aiee, killing interrupt handler

kernel: stack segment: 0000
kernel: CPU: 0
kernel: EIP: 0010:[do_mmap+101/868]
kernel: EFLAGS: 00010287
kernel: eax: 00003000 ebx: 02df1b18 ecx: 00000400 edx: 03e05a78
kernel: esi: 00000012 edi: 00003000 ebp: bfffff1f esp: 03e04f6c
kernel: ds: 0018 es: 0018 fs: 002b gs: 002b ss: 0018
kernel: Process bash (pid: 3938, process nr: 65, stackpage=03e04000)
kernel: Stack: 02df1b18 08093000 00003000 08096000 00000217 0011ab79 00000000 08093000
kernel: 00003000 00000007 00000012 00000000 03e05018 00003000 10000000 08092984
kernel: 00000003 00003000 0808bd67 0010a61d 08096000 08096000 4009cb74 00003000
kernel: Call Trace: [sys_brk+341/364] [system_call+85/124]
kernel: Code: f6 45 49 20 74 1d 8b 45 44 c1 e0 0c 01 f8 39 82 e0 01 00 00

kernel: invalid operand: 0000
kernel: CPU: 0
kernel: EIP: 0010:[<00d91477>]
kernel: EFLAGS: 00010046
kernel: eax: 00d91477 ebx: 001d9e8c ecx: 00000000 edx: 00000000
kernel: esi: 00000000 edi: 00000000 ebp: 03dddefc esp: 03ddded0
kernel: ds: 0018 es: 0018 fs: 002b gs: 002b ss: 0018
kernel: Process mv (pid: 20060, process nr: 68, stackpage=03ddd000)
kernel: Stack: 0010c90d 00000000 00000000 03dddefc 03dddefc 00149498 00000001 03dddf50
kernel: 0010b618 00000000 03dddefc 03f38018 00000000 001de1f4 00149498 00000001
kernel: 03dddf50 03f381b8 00000018 00000018 0000002b 0000002b fffffffe 0011243f
kernel: Call Trace: [do_IRQ+45/80] [tcp_retransmit_timer+0/224]
[fast_IRQ0_interrupt+88/128] [tcp_retransmit_timer+0/224]
[timer_bh+kernel: [sys_read+114/232] [system_call+85/124]
kernel: Code: f0 f0 f0 f0 f0 f0 f0 f0 f0 f0 f0 f0 f0 f0 f0 f0 f0 f0 f0 f0
kernel: Aiee, killing interrupt handler

kernel: stack segment: 0000
kernel: CPU: 0
kernel: EIP: 0010:[do_mmap+101/868]
kernel: EFLAGS: 00010287
kernel: eax: 00003000 ebx: 02df1b18 ecx: 00000400 edx: 03e05a78
kernel: esi: 00000012 edi: 00003000 ebp: bfffff1f esp: 03e04f6c
kernel: ds: 0018 es: 0018 fs: 002b gs: 002b ss: 0018
kernel: Process bash (pid: 3938, process nr: 65, stackpage=03e04000)
kernel: Stack: 02df1b18 08093000 00003000 08096000 00000217 0011ab79 00000000 08093000
kernel: 00003000 00000007 00000012 00000000 03e05018 00003000 10000000 08092984
kernel: 00000003 00003000 0808bd67 0010a61d 08096000 08096000 4009cb74 00003000
kernel: Call Trace: [sys_brk+341/364] [system_call+85/124]
kernel: Code: f6 45 49 20 74 1d 8b 45 44 c1 e0 0c 01 f8 39 82 e0 01 00 00

(Geez, I guess I had a few issues).

I also got several of these:

kernel: Whee.. inode changed from under us. Tell Linus

And also some fatal scrolling messages about "page already on freelist"
(or something close to that).

My hardware (currently) is a Micron P-133 with Micronics MB and 64MB EDO
Ram and a Quantum 3GB IDE, PCI NE2000, DEC Tulip and WD8013 NICs (cable
modem, 100 and 10Mbps LAN).

The machine isn't too busy usually, but does do a fair bit of IP masq/net
activity. Its on a UPS and has good cooling.

I am running RH 4.2+updates with stock 2.0.33. More details available
upon request.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu