Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

From: Trond Myklebust
Date: Wed Apr 18 2007 - 09:16:11 EST


On Wed, 2007-04-18 at 07:38 -0500, Florin Iucha wrote:
> On Tue, Apr 17, 2007 at 10:37:38PM -0700, Andrew Morton wrote:
> > Florin, can we please see /proc/meminfo as well?
>
> http://iucha.net/nfs/21-rc7-nfs2/meminfo
>
> > Also the result of `echo m > /proc/sysrq-trigger'
>
> http://iucha.net/nfs/21-rc7-nfs2/big-copy
>
> This has 'echo m > /proc/sysrq-trigger', 'echo t >
> /proc/sysrq-trigger' and 'echo 0 > /proc/sys/sunrpc/rpc_debug'.

Thanks.

So it looks as if you have a massive backlog of requests waiting in the
RPC layer to get sent. That would indeed trigger the BDI congestion
control stuff, and prevent you from sending more requests. The
interesting bit is this:

[ 399.665314] -pid- proc flgs status -client- -prog- --rqstp- -timeout -rpcwait -action- ---ops--
[ 399.665338] 40373 0001 0001 -11 ffff81007f418508 100003 ffff810078eb0de0 0 xprt_resend ffffffff804196bf ffffffff80440b10
[ 399.665345] 40391 0001 0001 -11 ffff81007f418508 100003 ffff810078eb05c8 0 xprt_sending ffffffff804196bf ffffffff80440b10
[ 399.665351] 40392 0001 0001 -11 ffff81007f418508 100003 ffff810078eb0128 0 xprt_sending ffffffff804196bf ffffffff80440b10
[ 399.665358] 40393 0001 0001 -11 ffff81007f418508 100003 ffff810078eb1158 0 xprt_sending ffffffff804196bf ffffffff80440b10
[ 399.665364] 40394 0001 0001 -11 ffff81007f418508 100003 ffff810078eb0f08 0 xprt_sending ffffffff804196bf ffffffff80440b10
[ 399.665371] 40395 0001 0001 -11 ffff81007f418508 100003 ffff810078eb0000 0 xprt_sending ffffffff804196bf ffffffff80440b10
[ 399.665377] 40396 0001 0001 -11 ffff81007f418508 100003 ffff810078eb1030 0 xprt_sending ffffffff804196bf ffffffff80440b10
[ 399.665384] 40397 0001 0001 -11 ffff81007f418508 100003 ffff810078eb0cb8 0 xprt_sending ffffffff804196bf ffffffff80440b10
[ 399.665390] 40398 0001 0001 -11 ffff81007f418508 100003 ffff810078eb06f0 0 xprt_sending ffffffff804196bf ffffffff80440b10
[ 399.665397] 40399 0001 0001 -11 ffff81007f418508 100003 ffff810078eb0940 0 xprt_sending ffffffff804196bf ffffffff80440b10
[ 399.665404] 40400 0001 0001 -11 ffff81007f418508 100003 ffff810078eb0818 0 xprt_sending ffffffff804196bf ffffffff80440b10
[ 399.665410] 40401 0001 0001 -11 ffff81007f418508 100003 ffff810078eb0378 0 xprt_sending ffffffff804196bf ffffffff80440b10
[ 399.665417] 40402 0001 0001 -11 ffff81007f418508 100003 ffff810078eb0250 0 xprt_sending ffffffff804196bf ffffffff80440b10
[ 399.669252] 41086 0001 0001 0 ffff81007f418508 100003 ffff810078eb0a68 15000 xprt_pending ffffffff804196bf ffffffff80440b10
[ 399.669258] 41087 0001 0001 -11 ffff81007f418508 100003 ffff810078eb0b90 0 xprt_resend ffffffff804196bf ffffffff80440b10
[ 399.669265] 41088 0001 0001 -11 ffff81007f418508 100003 ffff810078eb04a0 0 xprt_sending ffffffff804196bf ffffffff80440b10

There is only one request on the 'pending' queue. That would usually
indicate that the connection to the server is down. Can you check using
"netstat -t" whether or not there is a connection in the 'ESTABLISHED'
state to the server? Please also repeat the command a couple of times in
order to see if the socket/port number on the connection changes.

Cheers
Trond
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/