Re: NFS locking

From: Neil Brown
Date: Wed May 10 2006 - 20:14:07 EST


On Wednesday May 10, chutz@xxxxxxxxxxxx wrote:
> We have a NFS server here with a fairly high load. The clients are
> Linux, FreeBSD and Solaris. The exported filesystem is XFS, which is onb
> a LVM drive. After between 3 and 30 days it seems that locking
> completely stops working, clients generally either error or simply lock
> up when they try to lock a file. The only way to fix it seems to be a
> reboot.

Reboot the client or the server?

>
> Last time it happened was on 2.6.17-rc2, it started around 2.6.15.
>
> There is nothing in the dmesg on the server, the (Linux) clients are
> printing this in the dmesg when something tries to create a lock:
>
> lockd: server xxx.xxx.xxx.xxx not responding, still trying
> lockd: server xxx.xxx.xxx.xxx not responding, still trying
> lockd: server xxx.xxx.xxx.xxx not responding, still trying
> lockd: server xxx.xxx.xxx.xxx not responding, still trying

Sounds like the server has locked up.
What does 'ps' on the server show for 'lockd'? Is it in 'D'? What is
the 'wchan'? Are any 'nfsd's permanently in 'D'?

Try
echo t > /proc/sysrq-trigger

and see what the stack trace for lockd is - probably only useful if it
is in 'D'.

Maybe a 'tcpdump -s 1500' of traffic between client and server would
help.

NeilBrown

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/