NFS Oops - the third today (2.2.13)

Jens Benecke (jens@pinguin.conetix.de)
Fri, 19 Nov 1999 22:07:58 +0100


Hi everybody,

For the third time, my server hits 10 days uptime and starts spitting
NFS-related Oops'es about. They all look like this. Restarting rpc.nfsd
usually fixes this, if you can get at the server in time (before all
clients hang.) But this is not acceptable for a production environment.

Client is running this Mandrake Linux 6.1 (basically a beefed-up Redhat
6.1) with SMP kernel on a UP machine because I was too lazy to recompile
the default install kernel (is that a problem? it hasn't been so far):

Linux mandrake.zuhause.de 2.2.13-22mdk #1 SMP Fri Oct 22 02:06:33 CEST 1999
i586 unknown
Kernel modules 2.1.121
Gnu C pgcc-2.91.66
Binutils 2.9.1.0.25
Linux C Library 2.1.1
Dynamic linker ldd (GNU libc) 2.1.1
Linux C++ Library 2.9.0
Procps 2.0.2
Mount 2.9w
Net-tools 1.52
Console-tools 0.2.0
Sh-utils 2.0
Sh-utils Parker.
Sh-utils
Sh-utils Inc.
Sh-utils NO
Sh-utils PURPOSE.
Modules Loaded nfs lockd sunrpc tulip nls_iso8859-1 nls_cp437 vfat
fat awe_wave sb uart401 sound soundlow soundcore ncr53c8xx

The server runs this:

Linux earth 2.2.13 #1 Don Okt 21 22:27:31 CEST 1999 i586 unknown
Kernel modules 2.1.121
Gnu C 2.95.2
Binutils 2.9.5.0.19
Linux C Library 2.1.2
Dynamic linker ldd: version 1.9.11
Procps 1.2.9
Mount 2.9g
Net-tools 2.01
Kbd 0.96
Sh-utils 1.16
Modules Loaded af_packet 3c509 ne2k-pci 8390 tulip unix

And this is the Oops:

Unable to handle kernel paging request at virtual address 00200000
current->tss.cr3 = 00628000, %cr3 = 00628000
*pde = 00000000
Oops: 0000
CPU: 0
EIP: 0010:[find_buffer+42/68]
EFLAGS: 00010206
eax: 00200000 ebx: 0022b9b3 ecx: 00000307 edx: 00200000
esi: 00000400 edi: 00000400 ebp: 00000307 esp: c06d1d90
ds: 0018 es: 0018 ss: 0018
Process rpc.nfsd (pid: 14764, process nr: 43, stackpage=c06d1000)
Stack: c0126fa1 00000307 0022b9b3 00000400 0022b9b2 000019b1 c0001a30 c06d0000
c012722f 00000307 0022b9b3 00000400 c0126fa1 0022b9b3 0022b9b3 c0317870
c0317870 c0035960 c1598200 c0317870 c0127488 c1598500 0022b8e5 00000400
Call Trace: [get_hash_table+29/44] [getblk+35/356] [get_hash_table+29/44] [__brelse+28/100] [ext2_alloc_block+116/352] [block_getblk+342/672] [sock_def_readable+45/56]
[ext2_getblk+527/540] [__brelse+19/100] [ext2_file_write+562/1484] [ip_rcv+715/764] [handle_IRQ_event+61/120] [net_bh+390/480] [ext2_file_lseek+0/128] [sys_write+195/232]
[common_interrupt+24/32] [system_call+52/56]
Code: 8b 12 39 58 04 75 f3 39 70 08 75 ee 66 39 48 0c 75 e8 89 c2
eth0: Promiscuous mode enabled.
device eth0 entered promiscuous mode

The last bit was me tcpdump'ing a bit - but I wasn't able to get anything
but a huge load of "client > server.nfs: 184 lookup [|nfs]" and "204
lookup" and "server.nfs > client.nfs: reply ok 128 lookup [|nfs]. I cannot
reproduce this bug.

I have had this problem with three different versions of nfsd (user space
btw) 2.2beta37 until 2.2beta4X and at least five different kernel revisions
from 2.2.12 thru 2.2.13preXX till 2.2.13. I would really like to know what
is going on here and if one of the pre-14 kernels will fix that.

Thanks for any help.

-- 
_ciao, Jens_______________________________ http://www.pinguin.conetix.de

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/