NFS client problems in 2.4.18 to 2.4.20

From: Joshua Weage
Date: Fri Sep 05 2003 - 15:47:46 EST


I hope this was not discussed previously, I couldn't find anything
relevant in the archives.

I am having problems with NFS clients getting stuck after reporting a
"nfs server not responding message". The majority of the time the
mount starts working again when the nfs server load goes down.
However, sometimes the mount on one client becomes completely
unresponsive, but all of the clients still work correctly. Even after
letting it set for 2-3+ hours it still doesn't come back up. I can
ping the server from the locked client and that works. If I do a lazy
unmount and then remount the NFS disk it works again for awhile - but
tends to lock up again. A standard umount doesn't work when the client
is hung.

This happens with all RedHat kernel releases 2.4.18 to 2.4.20.

I have tried tuning the NFS server by going to nfs utils 1.0.3 and by
increasing nfsd's and the socket buffer sizes. I have also increased
the timeout on the clients to 2.0. One thing that seems to help is to
enable async mode on the NFS server. However, I've still seen the same
client hang with async turned on.

Machine Details:
12x Cluster nodes 2xAMD Athlon MP's, 100 MbEthernet
1x server 2xPentium III 1.13GHz, Adaptec 39160, Promise RM8000,
GigEthernet
1x Cisco 2924-T switch.

I'm running 8 CPU jobs, each cpu occasionally writes 120MB files to the
NFS disk. The client lockup always occurs during these file writes.
The lockups have occured on several of the cluster nodes.

Any suggestions on what could be causing this?

Thanks,

Joshua Weage



=====


__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/