Re: NFS problems in 2.1.54

Steven N. Hirsch (shirsch@ibm.net)
Thu, 11 Sep 1997 18:57:40 -0400 (EDT)


On Thu, 11 Sep 1997, Bill Hawes wrote:

> Steven N. Hirsch wrote:
> > There is something still quite broken with the NFS client in 2.1.54.
>
> >[...]
>
> > I have noted this behavior starting with 2.1.36, FWIW. After all the
> > discussion about fixes to the NFS subsystem, I had high hopes. Those
> > folks hacking in the NFS code would be advised to use kernel patching and
> > compilation as a bug trigger. It scores 100% at my end..
> >
> > Bill Hawes: If you have any ideas, or would like me to do any on-point
> > testing, please say the word!
>
> Hi Steve,
> I'd like to take you up on your offer to help test and track down NFS
> bugs :-) I'll be reviewing the NFS client code over the next couple of
> weeks, so if you (or anyone) can provide specific bug reports, it would
> help a lot.
>
> The most helpful approach would be to run an NFS test suite (perhaps
> Olaf Kirch has one?), and to use strace to pinpoint failing calls.

I'll see if I can find something suitable.

> Kernel compilation is a good go/no go test, but if it fails, it's hard
> to tell what went wrong because of the complexity of the build process.
> But if you report that "mv fails when the source is a symlink with xxx
> permission", someone can probably fix the call right away.

If I could nail it down to that level, I'd fix it myself <g>.

> My impression is that most of NFS client is pretty solid and that there
> are just a few small problems that need to be fixed.
>

Well, we've got an opportunity. The newer NFS clients and server have
been blazingly fast here on benchmarks, but absolutely unusable for
large-scale builds. The failure rate is literally 100% on "make dep".

I do recall that one of the actions from this target (excerpted from
linux/Makefile) triggered it:

dep-files: scripts/mkdep archdep include/linux/version.h
scripts/mkdep init/*.c > .tmpdepend

--> scripts/mkdep `find $(FINDHPATH) -follow -name \*.h ! -name \
modversions.h -print`

set -e; for i in $(SUBDIRS); do $(MAKE) -C $$i fastdep; done
mv .tmpdepend .depend

I know this sounds bizarre, but the piped output from 'find' became
truncated and garbaged. Needless to say, this stopped things dead in
their tracks. I think "find" itself received corrupted data from the NFS
mounted source tree. Perhaps this type of recursive processing hammers
NFS at the resonant frequency (so to speak)?

Another reliably repeatable (though not 100% so) failure mode occurs when
performing a large kernel patch over NFS. Eventually, something will fail
with a spurious error and the target file is mangled.

I use these same networked machines _heavily_ under the old user NFS
server and 2.0.x client. Never had a single problem until trying
post-2.1.36.

Steve