Re: NFS

Dan Merillat (Dan.Merillat@ao.net)
Wed, 6 Mar 1996 09:00:00 -0500 (EST)


On Wed, 6 Mar 1996, Alan Cox wrote:

> Date: Wed, 6 Mar 1996 11:30:15 +0000 (GMT)
> From: Alan Cox <alan@cymru.net>
> To: Dan Merillat <Dan.Merillat@ao.net>
> Cc: linux-kernel@vger.rutgers.edu
> Subject: Re: NFS
>
> > client# mount server:/usr/local/other /usr/local/other
> > mount: server:/usr/local/other allready mounted or /usr/local/other busy
>
> Its already mounted, NFS is stateless so a server crash isnt a problem,
> your program should just recover and carry on after a reboot.

> > client# mount -o remount /usr/local/other
> > aha, sucess!
> > client# cd /usr/local/other
> > client# ls
> > and it hangs.
>
> Thats a funny one, and probably a bug.

Very wierd. I finally got them all back: moving the mount-point to another
dir and back. Suddenly, all those processes waiting on disk finished.

Even with the machine back up, serving NFS, any new process would hang.

(Which is why I was thinking stupid things about NFS and remounting...)

> > 2) Anything on NFS mounted partitons can get it dropped out from under them,
> > without hanging the machine (EIO or similar)
> See the "soft" option, and also "intr".

I.E. man xxxx before complaining about xxxx, otherwise known as RTFM.
:-(

> > 3) Remount of NFS partition should re-contact the server and really
> > re-establish the connection, instead of just returning sucess.
>
> NFS is stateless, it has no concept of a remount.

Well, then perhaps a refresh? I know the server was up, but new
processes just hung anyway. I'll try to reproduce that one.

> Processes in state 'D' cannot be killed, they are in that state as they are
> in a kernel wait that cannot be recovered from. A zombie is already dead.

Yes, but both can accumulate in strange circumstances. How many times
have you had a zombie process that wouldn't finish dying? Not a major
problem, but they are ugly on a ps.

>
> > How bad would it be to have a process waiting on disk activity to die?
> > I suppose you would need to put something there to catch the return from
> > the disk, but that would only eat a buffer -> /dev/null, so thats no biggie
> > (is it?)
>
> It would not be pretty. You can if you feel like it go and work through all
> the sleep_on and interruptible_sleep_on code and migrate the former to the
> latter adding all the recovery stuff - in many cases thats a big job.

Ugg. I was hoping it could be done by pointing the (possible) disk
return to an empty buffer, and returning an IO error to the waiting
process, which also has a signal 9 pending. I'll look through the
code and check that out.

--Dan