nfs3 problem: aix-server, amd, linux 2.4.10 - 2.4.17pre8 client

From: Birger Lammering (b.lammering@science-computing.de)
Date: Tue Dec 18 2001 - 10:10:39 EST

Next message: Dead2: "Re: The direction linux is taking"
Previous message: Dave Jones: "Re: 2.5.1 crashed during sending processes TERM signal"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi,

forgot to cc this to linux-kernel and amd-dev.

You might remember:
(http://www.uwsg.indiana.edu/hypermail/linux/kernel/0111.2/0562.html)
The bug is still hiding somewhere between nfs3 client (or server?) and
amd....

To: Ion Badulescu <ion@cs.columbia.edu>, trond.myklebust@fys.uio.no
Date: Mon, 17 Dec 2001 19:18:29 +0100

Hi Ion and Trond,

Ion Badulescu writes:
> So, can you try to get the same /proc/mounts line while mounting by hand,
> and see if the problem re-appears? The command should be something like
> mount -o nosuid,nodev,vers=3,tcp,intr,hard,rsize=32768,wsize=32768 ...

ok, done. But there was no lock-up.

> The reason I'm not really suspecting amd, but rather the kernel NFS
> client, is because amd is only involved in mounting the server, it doesn't
> do much after that. So unless there is a race condition somewhere which
> involves quickly unmounting and remounting the same share, I don't see how
> amd could be the cause here.
>
> > It's not only the bug in the TCP/IP or NFS driver that is
> > interesting. I guess for tracing it, it would be cool to have some
> > hint on what triggers the bug. So far I could not reproduce it with
> > manual mounts (I've printed out the man page and tried all kinds of
> > mount options allready :-).
>
> Well, try the above. If that mount command doesn't reproduce the lock-up,
> then try forcing amd to keep the share mounted (simply have a shell
> chdir'ed into that directory) and see if it still locks up. If it does,
> then I'm at a loss...

The cp was started after cd'ing into the target directory - and it
locked up -> there is almost surely no race condition, caused by
quickly mounting and umounting the share. (btw. we would have seen
that in the tcpdump). I even "ls -l"'ed in another shell and saw that
the file size grew up to 786432 bytes until cp locked up. The
remainder of the file was copied in one go.

The lock-up cannot be reproduced in a trivial way without amd; and the
share is not umounted during the copy attempt. I have no clue how to
nail down the bug; unless Trond finds something by inspecting the
nfs-related changes from Linux 2.4.9 to 2.4.10.... (hint, hint :-)
2.4.9 and older don't show this behaviour...

An idea for a possible (and ugly) work around, that came up here, was
to tell amd to use the mount command rather than the mount system
call. This can be done by editing the NIS map. I find this rather
inconvenient for our purpose - to say the least :-/. Would it be
possible to invent an amd.conf option (i.e. 'nfs_program=mount') that
tells amd to use the mount/umount programs rather than the system
calls? Or can I replace the mount system call in
conf/mount/mount_linux.c by a system("mount ...") call and recompile?
:-)

Cheers,
Birger

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Dead2: "Re: The direction linux is taking"
Previous message: Dave Jones: "Re: 2.5.1 crashed during sending processes TERM signal"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Sun Dec 23 2001 - 21:00:16 EST