Re: NFS hang + umount -f: better behaviour requested.

From: Peter Staubach
Date: Tue Aug 21 2007 - 15:05:23 EST


John Stoffel wrote:
"Peter" == Peter Staubach <staubach@xxxxxxxxxx> writes:

Peter> John Stoffel wrote:
Robin> I'm bringing this up again (I know it's been mentioned here
Robin> before) because I had been told that NFS support had gotten
Robin> better in Linux recently, so I have been (for my $dayjob)
Robin> testing the behaviour of NFS (autofs NFS, specifically) under
Robin> Linux with hard,intr and using iptables to simulate a hang.
So why are you mouting with hard,intr semantics? At my current
SysAdmin job, we mount everything (solaris included) with 'soft,intr'
and it works well. If an NFS server goes down, clients don't hang for
large periods of time.

Peter> Wow! That's _really_ a bad idea. NFS READ operations which
Peter> timeout can lead to executables which mysteriously fail, file
Peter> corruption, etc. NFS WRITE operations which fail may or may
Peter> not lead to file corruption.

Peter> Anything writable should _always_ be mounted "hard" for safety
Peter> purposes. Readonly mounted file systems _may_ be mounted
Peter> "soft", depending upon what is located on them.

Not in my experience. We use NetApps as our backing NFS servers, so
maybe my experience isn't totally relevant. But with a mix of Linux
and Solaris clients, we've never had problems with soft,intr on our
NFS clients.

We also don't see file corruption, mysterious executables failing to
run, etc.

Now maybe those issues are raised when you have a Linux NFS server
with Solaris clients. But in my book, reliable NFS servers are key,
and if they are reliable, 'soft,intr' works just fine.

Now maybe if we had NFS exported directories everywhere, and stuff
cross mounted all over the place with autofs, then we might change our
minds.

In any case, I don't dis-agree with the fundamental request to make
the NFS client code on Linux easier to work with. I bet Trond (who
works at NetApp) will have something to say on this issue.

Just for the others who may be reading this thread --

If you use sufficient network bandwidth and high quality
enough networks and NFS servers with plenty of resources,
then you _may_ be able to get away with "soft" mounting
for a some period of time.

However, any server, including Solaris and NetApp servers,
will fail, and those failures may or may not affect the
NFS service being provided. In fact, unless the system
is being carefully administrated and the applications are
written very well, with error detection and recovery in
mind, then corruption can occur, and it can be silent and
unnoticed until too late. In fact, most failures do occur
silently and get chalked up to other causes because it will
not be possible to correlate the badness with the NFS
client giving up when attempting to communicate with an
NFS server.

I wish you the best of luck, although with the environment
that you describe, it seems like "hard" mounts would work
equally well and would not incur the risks.

ps
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/