Re: [TCP bug] stuck distcc connections in latest -git

From: David Newall
Date: Tue Jul 22 2008 - 09:46:02 EST


Ingo Molnar wrote:
> hm, the distcc TCP hangs are back:
>

The missing four client-side connections are more interesting than the
unsent data.

> I.e. the client side send-queue is stuck in established state, server
> side thinks it's a proper established connection. Nobody makes any
> progress.
>

I might be missing something obvious, but I don't think there's anything
unusual in the three sessions displayed on the client. They should be
"ESTABLISHED", and on the server, too, just as they are.

> Also note the final 4 connections on the server side - those are not
> present on the client box.
>

Now this is interesting. I would be much more interested in how the
client's sides for these disappeared.

> The hung condition seemed permanent (i waited a couple of minutes).
>

Not nearly long enough. Retransmits can be sent as infrequently as per
180 seconds. I think there's an argument to use one of the the various
patches that reduce your TCP_RTO_MAX, for example OBATA Noboru's
(http://marc.info/?l=linux-netdev&m=118422471428855): you don't have to
wait unreasonably long before seeing a retransmit. Remember, three minutes!


> I retried the same build 10 times and it would not reproduce - so this
> again is a hard to reproduce condition. (and there's no chance to get a
> proper tcpdump either, at these traffic levels)

You really should start that capture, and on both client and server.
You don't need to dump everything, only traffic to or from server:distcc.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/