Re: After many hours all outbound connections get stuck in SYN_SENT

From: Jan Engelhardt
Date: Mon Dec 17 2007 - 18:14:49 EST



On Dec 14 2007 15:39, James Nichols wrote:
>
>However, after approximately 38 hours of operation, all outbound
>connection attempts get stuck in the SYN_SENT state. It happens
>instantaneously, where I go from the baseline of about 60-80 sockets
>in SYN_SENT to a count of 200 (corresponding to the # of java threads
>that make these calls).
>
>When I stop and start the Java application, all the new outbound
- ^

at that point, try tcpdump. It may, or may not, show something.

>connections still get stuck in SYN_SENT state.
>During this time, I am still able to SSH to the box

Try uploading something through rsync+ssh, or scp+ssh. If it aborts
or hangs after a while, that may be an strong indication of a crappy
router. Also, I'd advise to upgrade to something newer like >=
2.6.22. There was one of those SACK-broken routers around here too,
but it seemed to have been replaced (or linux got a mysterious fix
:-) as one day when I tried turning off SACK, rsync didnot abort
anymore on new connections.

Though, if SACK was the problem, the problem would be much more
likely to appear after the handshake. YMMV.


>and run wget to Google, cnn, etc, so the
>problem appears to be specific to the hosts that I'm accessing via the
>webservices.
>
>For a long time, the only thing that would resolve this was rebooting
>the entire machine. Once I did this, the outbound connections could
>be made succesfully. However, very recently when I had once of these
>incidents I disabled tcp_sack via:
>
>echo "0" > /proc/sys/net/ipv4/tcp_sack
>
>And the problem almost instanteaously resolved itself and outbound
>connection attempts were succesful. I hadn't attempted this before
>because I assumed that if any of my network
>equipment or remote hosts had a problem with SACK, that it would never
>work. In my case, it worked fine for about 38 hours before hitting a
>wall where no outbound connections could be made.
>
>I'm running kernel 2.6.18 on RedHat, but have had this problem occur
>on earlier kernel versions (all 2.4 and 2.6).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/