Re: NFS locking

From: Patrick McLean
Date: Wed May 10 2006 - 21:44:53 EST


Neil Brown wrote:
> On Wednesday May 10, chutz@xxxxxxxxxxxx wrote:
>> We have a NFS server here with a fairly high load. The clients are
>> Linux, FreeBSD and Solaris. The exported filesystem is XFS, which is onb
>> a LVM drive. After between 3 and 30 days it seems that locking
>> completely stops working, clients generally either error or simply lock
>> up when they try to lock a file. The only way to fix it seems to be a
>> reboot.
>
> Reboot the client or the server?
>

The server, rebooting the clients had no effect.

>> Last time it happened was on 2.6.17-rc2, it started around 2.6.15.
>>
>> There is nothing in the dmesg on the server, the (Linux) clients are
>> printing this in the dmesg when something tries to create a lock:
>>
>> lockd: server xxx.xxx.xxx.xxx not responding, still trying
>> lockd: server xxx.xxx.xxx.xxx not responding, still trying
>
> Sounds like the server has locked up.
> What does 'ps' on the server show for 'lockd'? Is it in 'D'? What is
> the 'wchan'? Are any 'nfsd's permanently in 'D'?
>
> Try
> echo t > /proc/sysrq-trigger
>
> and see what the stack trace for lockd is - probably only useful if it
> is in 'D'.
>
> Maybe a 'tcpdump -s 1500' of traffic between client and server would
> help.

We have already rebooted the server this time around, we will do the stack trace
and tcpdump from a client next time it happens.

Though, I do seem to remember that lockd was in the "D" state on the server when
it happened this afternoon. Restarting the nfs service on the server did spawn a
new lockd process, but did not fix the problem.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/