Non-reproduceable oopses in 2.0.33 (Long)

Joe Konopka (jkonopka@itol.com)
Mon, 5 Jan 1998 12:02:03 -0600 (CST)


Greetings,

Had a nasty mishap with 2.0.33 yesterday. It started like this:

Jan 4 16:00:15 haloblack kernel: Unable to handle kernel paging request at virtual address c813eed9
Jan 4 16:00:15 haloblack kernel: current->tss.cr3 = 00959000, \r3 = 00959000
Jan 4 16:00:15 haloblack kernel: *pde = 00000000
Jan 4 16:00:15 haloblack kernel: Oops: 0000
Jan 4 16:00:15 haloblack kernel: CPU: 0
Jan 4 16:00:15 haloblack kernel: EIP: 0010:[<0813eed9>]
Jan 4 16:00:15 haloblack kernel: EFLAGS: 00010212
Jan 4 16:00:15 haloblack kernel: Stack: 00001fc3 00000000 00000010 bfffe69c 00000000 34b0066f bfffe65c 0000541b
Jan 4 16:00:15 haloblack kernel: 0012301c 001179a3 00000001 bfffe69c 00000008 0222c018 bfffe694 0222c018
Jan 4 16:00:15 haloblack kernel: bfffefea 0010a665 bfffe69c 00000000 bfffe694 bfffe694 34b0066f bfffe65c
Jan 4 16:00:15 haloblack kernel: Call Trace: [sys_read+192/232] [sys_gettimeofday+27/112] [system_call+85/128] [floppy_setup+157/336]
Jan 4 16:00:15 haloblack kernel: Code: <1>Unable to handle kernel paging request at virtual address c813eed9
Jan 4 16:00:15 haloblack kernel: current->tss.cr3 = 00959000, \r3 = 00959000
Jan 4 16:00:15 haloblack kernel: *pde = 00000000
Jan 4 16:00:15 haloblack kernel: Oops: 0000
Jan 4 16:00:15 haloblack kernel: CPU: 0
Jan 4 16:00:15 haloblack kernel: EIP: 0010:[die_if_kernel+640/704]
Jan 4 16:00:15 haloblack kernel: EFLAGS: 00010212
Jan 4 16:00:15 haloblack kernel: eax: 00000010 ebx: 0000002b ecx: 0813eed9 edx: 03857810
Jan 4 16:00:15 haloblack kernel: esi: 00000000 edi: 02b95000 ebp: 02b94f38 esp: 02b94edc
Jan 4 16:00:15 haloblack kernel: ds: 0018 es: 0018 fs: 0010 gs: 002b ss: 0018
Jan 4 16:00:15 haloblack kernel: Process BitchX (pid: 18709, process nr: 6, stackpage=02b94000)
Jan 4 16:00:15 haloblack kernel: Stack: 0000002b 00000000 c813eed9 02b94f38 0222c018 04800000 05000000 04800000
Jan 4 16:00:15 haloblack kernel: 00190018 0011161a 00195cfc 02b94f38 00000000 00111328 00002000 34b0066f
Jan 4 16:00:15 haloblack kernel: bfffe65c 0010a5eb 02c38218 00000000 0010a7f0 02b94f38 00000000 00aead8c
Jan 4 16:00:15 haloblack kernel: Stack: 0000002b 00000000 c813eed9 02b94f38 0222c018 04800000 05000000 04800000
Jan 4 16:00:15 haloblack kernel: 00190018 0011161a 00195cfc 02b94f38 00000000 00111328 00002000 34b0066f
Jan 4 16:00:15 haloblack kernel: bfffe65c 0010a5eb 02c38218 00000000 0010a7f0 02b94f38 00000000 00aead8c
Jan 4 16:00:15 haloblack kernel: Call Trace: [<04800000>] [<05000000>] [<04800000>] [end_scsi_request+184/324] [do_page_fault+754/772] [do_page_fault+0/772] [handle_bottom_half+11/32]
Jan 4 16:00:15 haloblack kernel: [error_code+64/80] [sys_read+192/232] [sys_gettimeofday+27/112] [system_call+85/128] [floppy_setup+157/336]
Jan 4 16:00:15 haloblack kernel: Code: 64 8a 04 0e 0f a1 88 c2 81 e2 ff 00 00 00 89 54 24 10 52 68

Funny thing is, I see calls in the trace pertaining to scsi and
floppy, the box has no scsi besides a parallel ZIP that was not
connected at the time, and there should've been no floppy activity
either. These proved fatal to the BitchX process in question. It
died with a segfault (nothing unusual for BX) and I didn't even
notice the oops until things started getting bizarre later.

I wasn't at the machine at the time, so I can't say for sure what
the hardware state was. 20 minutes later, this one:

Jan 4 16:20:25 haloblack kernel: task not on run-queue
Jan 4 16:20:26 haloblack last message repeated 621 times
Jan 4 16:20:26 haloblack kernel: general protection: 0000
Jan 4 16:20:26 haloblack kernel: CPU: 0
Jan 4 16:20:26 haloblack kernel: EIP: 0010:[datagram_select+29/380]
Jan 4 16:20:26 haloblack kernel: EFLAGS: 00010202
Jan 4 16:20:26 haloblack kernel: eax: 0013a75c ebx: 02ac324c ecx: fffffff2 edx: 00bdd018
Jan 4 16:20:26 haloblack kernel: esi: 00bdd018 edi: fffffff2 ebp: 02ac3200 esp: 00a54e14
Jan 4 16:20:26 haloblack kernel: ds: 0018 es: 0018 fs: 002b gs: 002b ss: 0018
Jan 4 16:20:26 haloblack kernel: Process rpc.nfsd (pid: 21510, process nr: 98, stackpage=00a54000)
Jan 4 16:20:26 haloblack kernel: Stack: 00000001 00089f24 00000001 02ac3200 03619c0c 00151956 00bdd018 00000001
Jan 4 16:20:26 haloblack kernel: fffffff2 00000001 00137bf5 02ac3290 00000001 fffffff2 00137bd0 0012d04a
Jan 4 16:20:26 haloblack kernel: 02ac3200 00089f24 00000001 fffffff2 00000004 00089f24 fffffff2 00000000
Jan 4 16:20:26 haloblack kernel: Call Trace: [inet_select+34/44] [sock_select+37/48] [sock_select+0/48] [check+50/132] [do_select+253/484] [sys_select+387/596] [old_select+63/80]
Jan 4 16:20:26 haloblack kernel: [system_call+85/128] [dcache_lookup+196/356]
Jan 4 16:20:26 haloblack kernel: Code: 8b 07 3d 54 01 00 00 77 3b 8d 04 40 8b 57 04 8d 14 82 89 5a

No nfs activity at the time, the nfsd should have been completely
idle. Finally, this one:

Jan 4 16:21:30 haloblack kernel: stack segment: 0000
Jan 4 16:21:30 haloblack kernel: CPU: 0
Jan 4 16:21:30 haloblack kernel: EIP: 0010:[normal_select+157/448]
Jan 4 16:21:30 haloblack kernel: EFLAGS: 00010202
Jan 4 16:21:30 haloblack kernel: eax: 00724001 ebx: 00725000 ecx: 03a412ec edx: 00000001
Jan 4 16:21:30 haloblack kernel: esi: 03a412ec edi: 00000000 ebp: 00000000 esp: 00af0e1c
Jan 4 16:21:30 haloblack kernel: ds: 0018 es: 0018 fs: 002b gs: 002b ss: 0018
Jan 4 16:21:30 haloblack kernel: Process in.telnetd (pid: 18319, process nr: 25, stackpage=00af0000)
Jan 4 16:21:30 haloblack kernel: Stack: 00725000 03a412ec 026a0300 00178d25 00725000 026a0300 03a412ec 00000001
Jan 4 16:21:30 haloblack kernel: 00000000 00178c94 03a412ec 00000001 026a0300 0012d04a 026a0300 03a412ec
Jan 4 16:21:30 haloblack kernel: 00000001 00000000 00000003 03a412ec 00000000 00000001 00000000 0012d199
Jan 4 16:21:30 haloblack kernel: Call Trace: [tty_select+145/164] [tty_select+0/164] [check+50/132] [do_select+253/484] [sys_select+387/596] [tcp_sendmsg+141/216] [tcp_sendmsg+206/216]
Jan 4 16:21:30 haloblack kernel: [inet_sendmsg+149/172] [sock_write+158/180] [sys_write+331/388] [old_select+63/80] [system_call+85/128] [load_aout_library+256/480]
Jan 4 16:21:30 haloblack kernel: Code: 83 f0 00 00 00 04 75 d3 51 e8 f9 c0 ff ff 83 c4 04 85 c0 75
Jan 4 16:21:30 haloblack kernel: release_dev: pty0: read/write wait queue active!
Jan 4 16:21:30 haloblack last message repeated 61 times

This one resulted in the telnetd becoming immune to kill -9, it was in
a "RW" state according to ps. Also, a VERY large volume of pty messages
(upward of 50k) were logged. Failing various attempts to kill the runaway
telnetd, I rebooted the system. The syslog looks something like this:

Jan 4 16:21:30 haloblack kernel: release_dev:ad/write wait queue active!
Jan 4 16:21:30 haloblack kernel: release_dev: pty0: read/write wait queue active!
Jan 4 16:21:34 haloblack last message repeated 5132 times
Jan 4 16:21:34 haloblack kernel: release_dev: pty0: read/wad/write wait queue active!
Jan 4 16:21:34 haloblack kernel: release_dev: pty0: read/write wait queue active!
Jan 4 16:21:43 haloblack last message repeated 10575 times
Jan 4 16:21:43 haloblack kernel: release_dev: pty0: read/write wait quad/write wait queue active!
Jan 4 16:21:43 haloblack kernel: release_dev: pty0: read/write wait queue active!
Jan 4 16:21:43 haloblack last message repeated 77 times
Jan 4 16:21:43 haloblack kernel: release_dad/write wait queue active!
Jan 4 16:21:43 haloblack kernel: release_dev: pty0: read/write wait queue active!
Jan 4 16:21:43 haloblack last message repeated 234 times
Jan 4 16:21:43 haloblack kernel: release_dad/write wait queue active!
Jan 4 16:21:43 haloblack kernel: release_dev: pty0: read/write wait queue active!
Jan 4 16:21:43 haloblack last message repeated 77 times
Jan 4 16:21:43 haloblack kernel: release_dad/write wait queue active!

Some of those seem to have their text corrupted. Nothing else of interest
follows in the syslog, just page after page of those until the reboot.

I haven't been able to reproduce it, just posting it in hopes that maybe
someone can figure out what happened and/or learn something from it.