Re: NFS client locking hangs for period

From: Christian Reis (kiko@async.com.br)
Date: Sun Jan 26 2003 - 11:02:00 EST


On Sat, Jan 25, 2003 at 02:54:09PM +1100, Neil Brown wrote:
> Hmmm. So you have several clients all mounting the same root
> filesystem, and mounting it writable? That doesn't sound like a plan
> for success. How do you make sure the clients don't tread over each
> other when using /etc files?

The truth is few (broken wrt the FHS) programs actually write to /etc. I
have set up everything so nothing is written to in /etc, and it actually
works very well (have to use a special init(8) that doesn't write to
/etc/ioctl.save). This setup has been running for almost a year now,
with the locking problem being the only one left to fix.

> I suspect that what you really want is to mount root read-only, or
> mount separate roots for each client, and then in either case to mount
> with the "nolock" flag.

Well, mounting root read-only is a good idea but it sacrifices being
able to administer the system from any station, and it also puts a lot
of burden on me to fix *all* programs to not write to anywhere on it.
This shouldn't be too hard, but we're still just working around the bug,
which I would really like to identify and fix.

> I suspect that your problem is related to the client trying to do
> locking, but no having statd running on the client.

I am 100% positive statd runs on every single client. This problem here
only happens spuriously. It goes away when I restart nfsd and mountd
(in that order). It really does look like a bug <wink>

> You cannot meaningfully do locking on an NFS mounted root filesystem.
> Infact, I think it would be good if the default mount options for nfs
> root included nolock... and if I read fs/nfs/nfsroot.c:root_nfs_name
> correctly, nolock is the default. Are you overriding that default
> be explicitly setting "lock"??

Nope. I've just tested and the default (specifying no lock option upon
bootup) really is nolock:

/dev/root on / type nfs (rw,v3,rsize=8192,wsize=8192,hard,udp,nolock,addr=192.168.99.4)

I wonder why you can't do locking on NFS root (if it's a current
limitation of if it doesn't make sense).

But I also think this problem shouldn't be happening if no locking was
going on. And when I checked using nlm_debug it sure did seem locking
was being used. What do you make of it?

Take care,

--
Christian Reis, Senior Engineer, Async Open Source, Brazil.
http://async.com.br/~kiko/ | [+55 16] 261 2331 | NMFL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Jan 31 2003 - 22:00:15 EST