NFS strangeness...

Dan Merillat (Dan@merillat.org)
Sat, 2 Nov 1996 20:58:09 -0500 (EST)


Not sure what this is, but it appears to be NFS related.

Suddenly I get segfaults when I ls a nfs mounted directory... and after
a while my NFS mount started failing. Nothing has been changed except
a kernel upgrade (2.0.12 to 2.0.22)

Here's the (snipped) strace:

<library loads snipped>
brk(0xf000) = 0xf000
lstat("/home/admin/sabbat", {st_mode=S_IFDIR|0770, st_size=2048, ...}) = 0
brk(0x10000) = 0x10000
<gettimeof day and such snipped>
brk(0x11000) = 0x11000
open("/etc/passwd", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=1668, ...}) = 0
brk(0x12000) = 0x12000
read(3, "root:<snip>:0:0:root:/roo"..., 4096) = 1668
brk(0x13000) = 0x13000
read(3, "", 4096) = 0
close(3) = 0
brk(0x14000) = 0x14000
open("/etc/group", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=963, ...}) = 0
brk(0x15000) = 0x15000
read(3, "root::0:root\nbin::1:root,bin,da"..., 4096) = 963
brk(0x16000) = 0x16000
uname({sys="Linux", node="chaos", ...}) = 0
getpid() = 440
brk(0x17000) = 0x17000
gettimeofday({846985118, 369599}, NULL) = 0
getpid() = 440
socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP) = 4
getpid() = 440
bind(4, {sin_family=AF_INET, sin_port=htons(616), sin_addr=inet_addr("0.0.0.0")}, 16) = -1 EACCES (Permission denied)
ioctl(4, FIONBIO, [1]) = 0
sendto(4, "3~U\231\0\0\0\0\0\0\0\2\0\1\206\240"..., 56, 0, {sin_family=AF_INET, sin_port=htons(111), sin_addr=inet_addr("127.0.0.1")}, 16) = 56
select(256, [4], NULL, NULL, {5, 0}) = 1 (in [4], left {5, 0})
recvfrom(4, "3~U\231\0\0\0\1\0\0\0\0\0\0\0\0\0"..., 400, 0, {sin_family=AF_INET, sin_port=htons(111), sin_addr=inet_addr("127.0.0.1")}, [16]) = 28
close(4) = 0
close(4) = -1 EBADF (Bad file number)
--- SIGSEGV (Segmentation fault) ---
+++ killed by SIGSEGV +++

And the dump of what went over the net: (chaos is the client, eola is the
server)

21:38:38.247334 chaos.ao.net.5d1a9fb1 > eola.ao.net.nfs: 160 lookup fh
Unknown/1 "admin" (ttl 64, id 1254)

21:38:38.257334 eola.ao.net.nfs > chaos.ao.net.5d1a9fb1: reply ok 128 lookup
fh Unknown/1 DIR 40755 ids 0/500 sz 1024 nlink 14 rdev 0 fsid 1 nodeid

2125e001 a/m/ctime 846985092.000000 846193346.000000 846193346.000000 (ttl
64, id 10438)
21:38:38.267334 chaos.ao.net.5d1a9fb2 > eola.ao.net.nfs: 160 lookup fh
Unknown/1 "sabbat" (ttl 64, id 1255)

21:38:38.277334 eola.ao.net.nfs > chaos.ao.net.5d1a9fb2: reply ok 128 lookup
fh Unknown/1 DIR 40770 ids 1607/550 sz 2048 nlink 13 rdev 0 fsid 1 nodeid
2b0090ea a/m/ctime 846984351.000000 843092593.000000 843092593.000000 (ttl
64, id 10440)

I'm not sure what that means... it dosn't segfault if I add a -n to the ls
for no name lookup. Anyway, is this a NFS server or client issue? Looks
like a client problem (which is why I posted it here.) Looks like it returned
good data for everything...

This kernel hasn't had a problem untill now, though I havn't looked at that
directory before from here. (I don't think so, at least)

Any ideas? I'm in the middle of compiling a stock 2.0.24 kernel (I'm running
with the SYN-flood patches and the ip-fragment patch, but I don't think
either of them should cause this.)

Only thing that was logged is:
nfs_rpc_verify: RPC call failed: 5

And only once... the seg fault is completely reproducable. I'm just
going to avoid that directory for now... I'm not quite sure what is different
about it.

BTW: what nfs server version should I be using? I'm not sure what the
remote version is but I could check.

--Dan