Kernel Oops - re: linux-nfs-0.4.21

John E. de Valpine (jedev@visarc.com)
Fri, 3 Oct 1997 09:53:34 -0400 (EDT)


Hi:

I have consistently generated the following kernel oops relating to kernel
nfsd, statd and lockd (linux-nfs-0.4.21). I would appreciate any thoughts
regarding how I might go about resolving these issues.

>CONFIGURATION
Currently I have two dual processor machines (128M RAM) running Linux 2.1.55
smp kernels. The kernels have knfsd and nfs compiled in. In order to compile
the clients from linux-nfs-0.4.21 I found it necessary to update the src and
headers in linux-nfs-0.4.21:

kernel/fs/lockd
kernel/fs/nfs
kernel/fs/nfsd
kernel/net/sunrpc
kernel/include/linux/lockd
kernel/include/linux/nfsd
kernel/include/linux/sunrpc

with those found in the kernel source for 2.1.55. The machines differ only
in their respective network interfaces. In the following discussion the
machine called "alternation" is the machine that provides nfs services to
the machine called "displacement." The machine "alternation" is also
providing DNS via named which both rely on to talk to each other.

>TESTING LOCKD
In order to test lockd, I made some minor alterations to the code for
testlk.c (found in tools/locktest in linux-nfs-0.4.21) that makes testlk run
for a command line specified amount of time before closing the file and
returning. Thus instead of pause() and waiting for a signal before
returning, I call sleep(delay) and then close(fd) before returning. Note
that "testlk" allows for testing different locks on a file, ie read, write,
block, test.

I then wrote a little perl script, called "locktest", that calls "testlk -b
testfile delaytime" (-b creates a blocking lock) n times with random delays
(1-10) passed to testlk. I can run this script in multiple xterms and watch
the transfer of the lock from process to process. This works great on either
machine if run on a local testfile. And it works fine if one copy is run by
"displacement" on a testfile that is in an nfs mounted directory. But it
results in a kernel oops on "alternation" (nfs server) if multiple copies
are run on "displacement" (nfs client) on a testfile in an nfs mounted
directory and the lockd kernel process on "alternation" goes zombie.
Additionally, lockd on "displacement" (nfs client) reports a failure to
monitor status on the IP for "alternation."

alternation testfile displacement testfile result
----------------------------------------------------------------------
locktest(n) local ok
locktest(n) local ok
locktest(1) nfs ok
locktest(2) nfs oops
locktest(1) local locktest(1) nfs oops

>KERNEL OOPS REPORT FROM ALTERNATION
<4>nfsd: wheee, dentry count == 0!
<4>nfsd: wheee, dentry count == 0!
<4>nfsd: wheee, dentry count == 0!
<4>nfsd: wheee, dentry count == 0!
<4>nfsd: wheee, dentry count == 0!
<3>kfree: Bad obj c01e557d
<1>Unable to handle kernel NULL pointer dereference at virtual address 00000000
<1>current->tss.cr3 = 00101000, %cr3 = 00101000
<1>*pde = 00000000
<4>Oops: 0002
<4>CPU: 1
<4>EIP: 0010:[<c0121eff>]
<4>EFLAGS: 00010286
<4>eax: 0000001b ebx: c0237284 ecx: c01e73d4 edx: c27b2000
<4>esi: c001e040 edi: c01e557d ebp: c38f2c64 esp: c273dea0
<4>ds: 0018 es: 0018 ss: 0018
<4>Process lockd (pid: 386, process nr: 41, stackpage=c273d000)
<4>Stack: c01c5645 c01e557d c001e000 c001e040 c001fc00 c38f2c64 c38f2c64
00000002
<4> 00000000 c015567d c01e557d c0156d2f c001e008 00000000 c38f2c90
c001fc00
<4> c0156e7d c001e000 00000000 c38f2c60 c38f27e0 c001ec00 c0158d64
c273df14
<4>Call Trace: [<c01c5645>] [<c015567d>] [<c0156d2f>] [<c0156e7d>]
[<c0158d64>] [<c0157a4a>] [<c0180c0b>]
<4> [<c015df0e>] [<c0110bac>] [<c0181046>] [<c011066c>] [<c0156923>]
[<c0180634>] [<c0156734>]
<4>Code: c7 05 00 00 00 00 00 00 00 00 83 c4 08 5b 5e 5f 5d 83 c4 0c

>KSYMOOPS
Using `/usr/src/linux/System.map' to map addresses to symbols.

>>EIP: c0121eff <do_readv_writev+13/260>
Trace: c01c5645 <skb_put_errstr+49da/5579>
Trace: c015567d <skb_copy+19/17c>
Trace: c0156d2f <neigh_unlink+23/5c>
Trace: c0156e7d <neigh_purge_send_q+59/5c>
Trace: c0158d64 <unix_create+a4/f8>
Trace: c0157a4a <dev_get_info+36/90>
Trace: c0180c0b <fdc_specify+9f/268>
Trace: c015df0e <ip_defrag+12e/408>
Trace: c0110bac <sys_syslog+194/2ec>
Trace: c0181046 <interpret_errors+1ca/274>
Trace: c011066c <do_fork+76c/87c>
Trace: c0156923 <scm_detach_fds+37/1c0>
Trace: c0180634 <wait_for_completion+1c/64>
Trace: c0156734 <__scm_send+1e4/258>

Code: c0121eff <do_readv_writev+13/260>
Code: c0121eff <do_readv_writev+13/260> c7 05 00 00 00 movl $0x0,0x0
Code: c0121f04 <do_readv_writev+18/260> 00 00 00 00 00
Code: c0121f0f <do_readv_writev+23/260> 83 c4 08 addl $0x8,%esp
Code: c0121f12 <do_readv_writev+26/260> 5b popl %ebx
Code: c0121f13 <do_readv_writev+27/260> 5e popl %esi
Code: c0121f14 <do_readv_writev+28/260> 5f popl %edi
Code: c0121f15 <do_readv_writev+29/260> 5d popl %ebp
Code: c0121f16 <do_readv_writev+2a/260> 83 c4 0c addl $0xc,%esp
Code: c0121f1f <do_readv_writev+33/260>

>RPC: UNAUTHENTICATED REQUESTS
On both machines I get the following error in KLOG if the query is made to
the localhost (RPC times out and reports that version 0 of service 100003 is
unavailable):
rpcinfo -u alternation nfs #on alternation
nfsd reports: unauthenticated request from <IP for alternation>
rpcinfo -u displacement nfs #on displacement
nfsd reports: unauthenticated request from <IP for displacement>

On the other hand:
rpcinfo -u alternation nfs #on displacement
RPC reports that version 2 of service 100003 is ready

But:
rpcinfo -u displacement nfs #on alternation
nfsd reports: unauthenticated request from <IP for displacement>

Although "displacement" it not serving NFS it still has nfs compiled into
the kernel as an fs, thus it should be able to reply to the query. I suspect
there must be some small configuration difference somewhere but I cannot put
my finger on it.

Thank you for your time in reviewing this, I would greatly appreciate any
help or advice that you might be able to offer.

Regards,

-Jack de Valpine

--------------------
### John E. de Valpine
### President
### VISARC Incorporated
###
### 617.241.0727
###
### http://www.visarc.com
###
### visualizing the built environment