1. The corruption always begins on a 4096 byte aligned offset
in the file (i.e. on a page boundary).
2. 1, 2, or 3 bytes of ZERO are written at the beginning of the page
and the rest of the page is SHIFTED by that amount. (When we first
saw this we thought a SCSI controller was failing on the Sun
server but we've not had any problems with data written via
NFS to this Sun from a bunch of WinNT boxes we have here. And,
as I said earlier, 2.1.84 works fine).
3. The location of the smashed page or pages is random. The first
is usually 4 or 5 megabytes into the file (which is 11M long) but
occasionally it is only 56K into the file.
4. The number of corrupted blocks in a 11M file is small, like
5 or 10.
Hope this provides a clue. I couldn't fathom why the data was
SHIFTED because that implies the page was COPIED someplace.
How many places in the NFS logic COPY entire pages? Perhaps that
is a place to look.
Now, a few questions:
1. How do I vary the NFS block size? (Larry asked that I try that).
2. How can I tell if I am using UDP versus TCP? I've done NOTHING
to explicitly configure NFS. We just use RedHat 5.0 out of the box
with the 2.1.X kernels.
3. Given I can determine UDP vs. TCP, how do I change it to the
other? Can I assume SunOS 5.5 supports both?
I'll run the NFS debug log experiment today and send you both the
diff's.
Last, we have CONFIG_NFS_FS and CONFIG_NFSD setup as kernel modules
and we have the RPM's 'nfs-server-2.2beta29-2' and
'nfs-server-clients-2.2beta29-2' installed.
-Ben McCann
-- Ben McCann Indus River Networks 31 Nagog Park Acton, MA, 01720 email: bmccann@indusriver.com web: www.indusriver.com phone: (978) 266-8140 fax: (978) 266-8111- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.altern.org/andrebalsa/doc/lkml-faq.html