Help Tracking Down Oops

Steve Shah (sshah@cert.UCR.EDU)
Fri, 29 Jan 1999 10:14:36 -0800


Hello Everyone,

I've got a 2.0.36 RH5.2 box (the kernel is straight from the
distribution). It is a file/print server doing mostly samba and NFS.
We were having a lot of oops's for a while, esp. during backups. When
we updated the kernel to 2.0.36 straight from the distribution we
hit 58 days of uptime until the box oops'd itself to death. What
makes this particular crash worrysome is tht is the same function
that it is oopsing on. It is a P200, 128M RAM. 8G IDE split in 6G
and 2G disks. 16G of SCSI through a Adaptec 2940 card. The server
gets medium to heavy use through the day and backed up using dump
at night (Amanda).

The EIP keeps pointing to find_candidate which is commented
in the 2.0.35 tree as "find a candidate buffer to be reclaimed." I'm
not sure if this is a bug or a hardware fault. Help? Please??
(note: I've got a boss who is still unclear on the linux development
cycle so upgrading to 2.2 right now is only going to happen if someone
shows this as a bug. 2.2 upgrades are going to get scheduled in the
next 2-3 months.) Many many thanks...

The Oops's:

Jan 19 15:37:14 ozone kernel: Unable to handle kernel paging request at virtual address e670377d
Jan 19 15:37:14 ozone kernel: current->tss.cr3 = 07502000, `r3 = 07502000
Jan 19 15:37:14 ozone kernel: *pde = 00000000
Jan 19 15:37:14 ozone kernel: Oops: 0000
Jan 19 15:37:14 ozone kernel: CPU: 0
Jan 19 15:37:14 ozone kernel: EIP: 0010:[find_candidate+42/228]
Jan 19 15:37:14 ozone kernel: EFLAGS: 00010206
Jan 19 15:37:14 ozone kernel: eax: 2670375d ebx: 02c8ea98 ecx: 00000400 edx: 0000af18
Jan 19 15:37:14 ozone kernel: esi: 00000000 edi: 00000007 ebp: 05229dbc esp: 05229db0
Jan 19 15:37:14 ozone kernel: ds: 0018 es: 0018 fs: 002b gs: 002b ss: 0018
Jan 19 15:37:14 ozone kernel: Process smbd (pid: 21396, process nr: 70, stackpage=05229000)
Jan 19 15:37:14 ozone kernel: Stack: 02c8ea98 00000000 0000af19 00009c00 00126119 2670375d 05229de8 00000400
Jan 19 15:37:14 ozone kernel: 0004507a 05220303 00000001 0004507a 00000009 0000015e 0000af18 0000466b
Jan 19 15:37:14 ozone kernel: 00000000 02c8ea98 055fcb18 00000000 0012656e 00000400 0004507a 05229ef4
Jan 19 15:37:14 ozone kernel: Call Trace:[refill_freelist+889/1144] [getblk+854/952] [ext2_alloc_block+122/376] [block_getblk+347/612] [ext2_getblk+366/528] [ext2_file_write+387/1112] [idefloppy_get_flexible_disk_page+292/536]
Jan 19 15:37:14 ozone kernel: [inet_recvmsg+97/112] [sock_read+170/192] [sys_write+337/396] [system_call+85/124]
Jan 19 15:37:14 ozone kernel: Code: 39 48 20 74 21 6a 01 8d 55 08 52 50 e8 f9 17 00 00 83 c4 0c

Jan 19 15:37:32 ozone kernel: Unable to handle kernel paging request at virtual address e670377d
Jan 19 15:37:32 ozone kernel: current->tss.cr3 = 0368d000, `r3 = 0368d000
Jan 19 15:37:32 ozone kernel: *pde = 00000000
Jan 19 15:37:32 ozone kernel: Oops: 0000
Jan 19 15:37:32 ozone kernel: CPU: 0
Jan 19 15:37:32 ozone kernel: EIP: 0010:[find_candidate+42/228]
Jan 19 15:37:32 ozone kernel: EFLAGS: 00010206
Jan 19 15:37:32 ozone kernel: eax: 2670375d ebx: 01aadd70 ecx: 00000400 edx: 0000af0f
Jan 19 15:37:32 ozone kernel: esi: 073f2798 edi: 00000007 ebp: 01aadd44 esp: 01aadd38
Jan 19 15:37:32 ozone kernel: ds: 0018 es: 0018 fs: 002b gs: 002b ss: 0018
Jan 19 15:37:32 ozone kernel: Process smbd (pid: 32229, process nr: 37, stackpage=01aad000)
Jan 19 15:37:32 ozone kernel: Stack: 01aadd70 00000000 00000118 00002000 00125e83 2670375d 01aadd70 00000400
Jan 19 15:37:32 ozone kernel: 0000164b 001f0801 00000001 001bd645 0000164b 0000015e 0000af0f 001bd645
Jan 19 15:37:32 ozone kernel: 00000400 00000e44 0012624d 00000801 0012656e 00000400 0000164b 001f6e3c
Jan 19 15:37:32 ozone kernel: Call Trace: [refill_freelist+227/1144] [getblk+53/952] [getblk+854/952] [ext2_new_block+2077/2348] [getblk+53/952] [ext2_alloc_blo
ck+359/376] [getblk+53/952]
Jan 19 15:37:32 ozone kernel: [block_getblk+347/612] [ext2_getblk+253/528] [ext2_file_write+387/1112] [inet_recvmsg+97/112] [sock_read+170/192] [sys_write+337/396] [system_call+85/124]
Jan 19 15:37:32 ozone kernel: Code: 39 48 20 74 21 6a 01 8d 55 08 52 50 e8 f9 17 00 00 83 c4 0c

Jan 19 15:37:46 ozone kernel: Unable to handle kernel paging request at virtual address e670377d
Jan 19 15:37:46 ozone kernel: current->tss.cr3 = 01e0e000, `r3 = 01e0e000
Jan 19 15:37:46 ozone kernel: *pde = 00000000
Jan 19 15:37:46 ozone kernel: Oops: 0000
Jan 19 15:37:46 ozone kernel: CPU: 0
Jan 19 15:37:46 ozone kernel: EIP: 0010:[find_candidate+42/228]
Jan 19 15:37:46 ozone kernel: EFLAGS: 00010202
Jan 19 15:37:46 ozone kernel: eax: 2670375d ebx: 01aadde8 ecx: 00000400 edx: 0000af0e
Jan 19 15:37:46 ozone kernel: esi: 073f2798 edi: 00000007 ebp: 01aaddbc esp: 01aaddb0
Jan 19 15:37:46 ozone kernel: ds: 0018 es: 0018 fs: 002b gs: 002b ss: 0018
Jan 19 15:37:46 ozone kernel: Process smbd (pid: 21402, process nr: 37, stackpage=01aad000)
Jan 19 15:37:46 ozone kernel: Stack: 01aadde8 00000000 00000118 0000c000 00125e83 2670375d 01aadde8 00000400
Jan 19 15:37:46 ozone kernel: 0004535a 01aa0303 00000001 0004535a 0004535a 0000015e 0000af0e 0004535a
Jan 19 15:37:46 ozone kernel: 07be3398 00000059 0012624d 00000303 0012656e 00000400 0004535a 01aadef4
Jan 19 15:37:46 ozone kernel: Call Trace: [refill_freelist+227/1144] [getblk+53/952] [getblk+854/952] [ext2_alloc_block+122/376] [block_getblk+347/612] [ext2_ge
tblk+366/528] [ext2_file_write+387/1112]
Jan 19 15:37:46 ozone kernel: [inet_recvmsg+97/112] [sock_read+170/192] [sys_write+337/396] [system_call+85/124]
Jan 19 15:37:46 ozone kernel: Code: 39 48 20 74 21 6a 01 8d 55 08 52 50 e8 f9 17 00 00 83 c4 0c

I'm seeing this oops spattered all over my logs right now. Not all of them
are bad enough to cause the system to crash. It seems though that if
it happens during a dump, dump goes ballistic.

Thanks in advance for any help.
-Steve

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/