knfsd/nfs experts: help me w/this tcpdump output

David Mansfield (david@cobite.com)
Sun, 6 Dec 1998 15:22:54 -0500 (EST)


I am currently trying to track down the answer to: Why doesn't knfsd work
w/ my solaris boxen? I am running 2.1.131-ac4 (thanks for the knfsd patch
merge Alan :-) and get this error consistently on my various solaris
nfs clients (solaris v2.5+patches and 2.6-3/31):

Dec 6 14:19:13 marvin unix: NFS write error on host spike: I/O error.
Dec 6 14:19:13 marvin unix: (file handle: 803e09ce 2780000 1780000
4300000 4300000 2000000 0 0)

I have tweaked and re-tweaked mount params on solaris to no avail,
currently I'm using: rw,rsize=1024,wsize=1024,vers=2,proto=udp,noac
because it triggers the error more quickly. I have in the past been
using 4096 size and not the 'noac' option, which seems pretty normal.
After mounting, I ran 'bonnie' on the solaris box, and:

tcpdump -s 256 host marvin and host spike >tcpdump.log

running on the linux server. bonnie exited (w/ an error), and the
near the end of the tcpdump.log shows: (slightly edited)

14:19:11.366204 marvin.3628821321 > spike.nfs: 1184 write fh Unknown/1 1024
(1024) bytes @ 56772608 (56772608) (DF)
14:19:11.366274 spike.nfs > marvin.3628821321: reply ok 96 write
14:19:12.606844 marvin.3628821321 > spike.fs: 1184 write fh Unknown/1 1024
(1024) bytes @ 56772608 (56772608) (DF)
14:19:12.606926 spike.nfs > marvin.3628821321: reply ok 28 write
14:19:12.608642 marvin.3628821322 > spike.nfs: 164 write fh Unknown/1 1
(1) bytes @ 56778752 (56778752) (DF)
14:19:13.031896 spike.nfs > marvin.3628821322: reply ok 96 write

Note the same "transaction id" occurs twice in a row (was the reply
dropped?) after a substantial pause, and the second reply is only 28 in
length. The next oddness is that the position (the number after the @ if
I'm interpreting it correctly) jumps to 56778752, after having
consistently increased by exactly 1024 each time (re-enforcing the dropped
packets theory), and is a 1 byte write. The remaining packets in the
tcpdump.log are writes which "fill in" the range between 56770560 and
56778752, at which point the data ends because bonnie has bombed out.
This range covers the "bad area" plus a bit before.

Perhaps the 28 byte reply to the duplicated request cannot be handled by
solaris? How can I check this?

System is PII/450, 256MB ram, DAC960 (driver 2.1 beta 3) all compiled
w/gcc 2.7.2.3. Ethernet is eepro100.

-- 
/==============================\
| David Mansfield              |
| david@cobite.com             |
\==============================/

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/