Re: [PATCH] NFS regression in 2.6.26?, "task blocked for more than120 seconds"

From: Trond Myklebust
Date: Tue Nov 25 2008 - 08:28:46 EST


On Tue, 2008-11-25 at 07:09 +0000, Ian Campbell wrote:
> On Sat, 2008-11-01 at 09:41 -0400, Trond Myklebust wrote:
> > On Sat, 2008-11-01 at 11:45 +0000, Ian Campbell wrote:
> > > On Mon, 2008-10-20 at 07:27 +0100, Ian Campbell wrote:
> > > > So far I have bisected down to this range and am currently testing
> > > > acee478 which has been up for >4days.
> > >
> > > Another update. It has now bisected down to a small range
> > >
> > > 7272dcd31d56580dee7693c21e369fd167e137fe SUNRPC: xprt_autoclose() should not call xprt_disconnect()
> > > e06799f958bf7f9f8fae15f0c6f519953fb0257c SUNRPC: Use shutdown() instead of close() when disconnecting a TCP socket
> > > ef80367071dce7d2533e79ae8f3c84ec42708dc8 SUNRPC: TCP clear XPRT_CLOSE_WAIT when the socket is closed for writes
> > > 3b948ae5be5e22532584113e2e02029519bbad8f SUNRPC: Allow the client to detect if the TCP connection is closed
> > > 67a391d72ca7efb387c30ec761a487e50a3ff085 SUNRPC: Fix TCP rebinding logic
> > > 66af1e558538137080615e7ad6d1f2f80862de01 SUNRPC: Fix a race in xs_tcp_state_change()
> > >
> > > I'm currently testing 3b948ae5be5e22532584113e2e02029519bbad8f.
> > >
> > > 7272dcd31d56580dee7693c21e369fd167e137fe repro'd in half a day while
> > > ef818a28fac9bd214e676986d8301db0582b92a9 (parent of
> > > 66af1e558538137080615e7ad6d1f2f80862de01) survived for 7 days.
>
> According to bisect:
>
> e06799f958bf7f9f8fae15f0c6f519953fb0257c is first bad commit
> commit e06799f958bf7f9f8fae15f0c6f519953fb0257c
> Author: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx>
> Date: Mon Nov 5 15:44:12 2007 -0500
>
> SUNRPC: Use shutdown() instead of close() when disconnecting a TCP socket
>
> By using shutdown() rather than close() we allow the RPC client to wait
> for the TCP close handshake to complete before we start trying to reconnect
> using the same port.
> We use shutdown(SHUT_WR) only instead of shutting down both directions,
> however we wait until the server has closed the connection on its side.
>
> Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx>
>
> I've started testing 2.6.26 + revert. It's been a long while since I
> started this process so I'll also have a go at an up to date version.
>
> Cheers,

That would indicate that the server is failing to close the TCP
connection when the client closes on its end.

Could you remind me what server you are using? Also, does 'netstat -t'
show connections that are stuck in the CLOSE_WAIT state when you see the
hang?

Trond

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/