nfsroot Bus error, comm.errro

joost witteveen (joost@rulcmc.leidenuniv.nl)
Tue, 17 Sep 1996 23:43:59 +0200 (MET DST)


I frequently boot linux useing the nfsroot system.

However, sometimes (this seems to happen more often when there
is something going on on the network) a random programme started
on a system using an nfsroot filesystem will abort with a
"bus error", or signal 7. When restarting the programme it usualy
runs succesfully.

As the nfsroot systems usually start 50-something processes, this
results in a failure rate of approx 50%, and it thus becomes a
real problem.

When looking at the boot process with tcpdump, the first thing
that differs between a succesfull boot an an unsuccesful one seems
to be: "ERROR: Communication error on send [|nfs]". It was my understanding
that such errors are in principle harmless, and should only result
in the client trying to re-read the file. This indeed happens, but,
apparently with some error.

I would really appreciate any info -- eighter how I can get better
diagnostics, or other things I may try.

System:
server Linux-{2.0.14,20.0.20}
client Linux-{2.0.15,2.0.18,2.0.20}
(I tried most permutations with server/client kernel, no diff)

$ /usr/sbin/rpc.nfsd -v
Universal NFS Server 2.2beta16

A slightly longer tcpdump (mail me of more or any other info):

rulvsc is the nfsroot client, rulcmc is the server.

tcpdump -s 200 host rulvsc
[..]
22:52:05.723068 rulvsc.LeidenUniv.nl.1c0d18ba > rulcmc.leidenuniv.nl.nfs: 116 lookup fh Unknown/1 "lib"
22:52:05.723068 rulcmc.leidenuniv.nl.nfs > rulvsc.LeidenUniv.nl.1c0d18ba: reply ok 128 lookup fh Unknown/1
22:52:05.723068 rulvsc.LeidenUniv.nl.1c0d18bb > rulcmc.leidenuniv.nl.nfs: 128 lookup fh Unknown/1 "ld-linux.so.1"
22:52:05.753069 rulcmc.leidenuniv.nl.nfs > rulvsc.LeidenUniv.nl.1c0d18bb: reply ok 128 lookup fh Unknown/1
22:52:05.753069 rulvsc.LeidenUniv.nl.1c0d18bc > rulcmc.leidenuniv.nl.nfs: 108 getattr fh Unknown/1
22:52:05.773069 rulcmc.leidenuniv.nl.nfs > rulvsc.LeidenUniv.nl.1c0d18bc: reply ok 96 getattr REG 100755 ids 0/0 sz 99308
22:52:05.773069 rulvsc.LeidenUniv.nl.1c0d18bd > rulcmc.leidenuniv.nl.nfs: 120 read fh Unknown/1 1024 bytes @ 0
22:52:05.803070 rulcmc.leidenuniv.nl.nfs > rulvsc.LeidenUniv.nl.1c0d18bd: reply ok 1124 read
22:52:05.803070 rulvsc.LeidenUniv.nl.1c0d18be > rulcmc.leidenuniv.nl.nfs: 120 read fh Unknown/1 1024 bytes @ 1024
22:52:05.813070 rulcmc.leidenuniv.nl.nfs > rulvsc.LeidenUniv.nl.1c0d18be: reply ok 1124 read
22:52:05.813070 rulvsc.LeidenUniv.nl.1c0d18bf > rulcmc.leidenuniv.nl.nfs: 120 read fh Unknown/1 1024 bytes @ 2048
22:52:05.833070 rulcmc.leidenuniv.nl.nfs > rulvsc.LeidenUniv.nl.1c0d18bf: reply ok 1124 read
22:52:05.843071 rulvsc.LeidenUniv.nl.1c0d18c0 > rulcmc.leidenuniv.nl.nfs: 120 read fh Unknown/1 1024 bytes @ 3072
22:52:05.843071 rulcmc.leidenuniv.nl.nfs > rulvsc.LeidenUniv.nl.1c0d18c0: reply ok 1124 read
22:52:05.853071 rulvsc.LeidenUniv.nl.1c0d18c1 > rulcmc.leidenuniv.nl.nfs: 120 read fh Unknown/1 1024 bytes @ 16384
22:52:05.863071 rulcmc.leidenuniv.nl.nfs > rulvsc.LeidenUniv.nl.1c0d18c1: reply ok 1124 read
22:52:05.873071 rulvsc.LeidenUniv.nl.1c0d18c2 > rulcmc.leidenuniv.nl.nfs: 120 read fh Unknown/1 1024 bytes @ 17408
22:52:05.873071 rulcmc.leidenuniv.nl.nfs > rulvsc.LeidenUniv.nl.1c0d18c2: reply ok 28 read ERROR: Communication error on send [|nfs]
22:52:05.873071 rulvsc.LeidenUniv.nl.1c0d18c3 > rulcmc.leidenuniv.nl.nfs: 120 read fh Unknown/1 1024 bytes @ 16384
22:52:05.873071 rulcmc.leidenuniv.nl.nfs > rulvsc.LeidenUniv.nl.1c0d18c3: reply ok 1124 read
22:52:05.873071 rulvsc.LeidenUniv.nl.1c0d18c4 > rulcmc.leidenuniv.nl.nfs: 120 read fh Unknown/1 1024 bytes @ 17408
22:52:05.883071 rulcmc.leidenuniv.nl.nfs > rulvsc.LeidenUniv.nl.1c0d18c4: reply ok 1124 read
22:52:05.883071 rulvsc.LeidenUniv.nl.1c0d18c5 > rulcmc.leidenuniv.nl.nfs: 120 read fh Unknown/1 1024 bytes @ 18432
22:52:05.883071 rulcmc.leidenuniv.nl.nfs > rulvsc.LeidenUniv.nl.1c0d18c5: reply ok 1124 read
22:52:05.893072 rulvsc.LeidenUniv.nl.1c0d18c6 > rulcmc.leidenuniv.nl.nfs: 120 read fh Unknown/1 1024 bytes @ 19456
22:52:05.893072 rulcmc.leidenuniv.nl.nfs > rulvsc.LeidenUniv.nl.1c0d18c6: reply ok 720 read

(this is the end, here all communication ends)

Thanks very much,

-- 
joost witteveen
            joost@rulcmc.leidenuniv.nl
          joostje@debian.org
--
Use Debian/GNU Linux!