[AGAIN] Problem: File-corruption when reading from a network-filesystem

From: Thomas Lenherr
Date: Fri Oct 22 2004 - 07:47:10 EST


[See http://bugzilla.kernel.org/show_bug.cgi?id=3608 ]

Hi,

I've got some files (9 x ~200mB) on a server with nfs and smb installed
and built the md5sums of these files on the server itself. Then I
mounted the nfs on my workstation and let it check the md5sums of these
files: something about 7/9 did not match. Then i just started md5sum
again and let it check again the same files, the result: all md5sums
matched! (And I saw on my network monitor there was almost load on the
network as the files were still cached locally so it had only to check
if they changed) (fyi: I got 2gB ram).
Then I made the same again with smbfs: i mounted the shares on my
workstation, ran md5sum and got something like 3/9 unmatched files, ran
it again and everything matched (again: the second time i got almost no
transfer).
Then I tested the local filesystem (so i copied files from local to
local and so on) and everything was fine.
In my next test i copied 4 files from the nfs on my local drive and let
them check (3/4 did not match) and repeating this check had always the
same result: always the same 3 files did not match and always the same
file did match, so it seems, they really were saved that way. Then I
deleted these local files, started cp again (this time it did not
transfer much => cache) and made the check again: all files were ok!
(same with smb)

BTW: I checked the server with an other workstation and everything went
fine, so the server seems to be ok.

BTW2: When I do not mount the smbfs, but rather use smbclient to get the
files they were always ok.

So it seems the files are transferred and cached correct on the client,
but it's somehow badly read if it's not already cached...

Here some infos about my systems:
workstation:
- kernel vanilla 2.6.9 (on a gentoo-system) (not tainted)
- motherboard: asus pc-dl deluxe (bios version 1007)
- dual xeon 3.06ghz
- 2gb non-ecc ram (made a memtest)
- nfsv3 (tried as modul and built-in)

server:
- kernel vanilla 2.6.8.1 (on a gentoo-system) (did not update because it
seems, the error is on the workstation)
- pII 400mhz
- 320mB non-ecc ram
- nfsv3 (built-in)
- smb 3.0.2a


I hope this is enough information... else just tell me what you need to
know....

One last thing: I tried the whole thing from a knoppix 3.4 (kernel
2.6.?) once from my workstation and once again from a third pc (beneath
the server and my workstation):
same result: everything went fine from the third pc, but with the same
knoppix on my workstation i got the results with the caching: when i
accessed the files the second time everything was fine...
So I _think_ it must be some kind of a hardware-dependent kernel-bug of
the 2.6 :(

anyway, TIA

Thomas Lenherr

Attachment: signature.asc
Description: OpenPGP digital signature