Re: Outgoing TCP connection problem

From: Dylan Griffiths (dylang+kernel@thock.com)
Date: Fri Aug 25 2000 - 03:56:23 EST


Stephen Satchell wrote:
> For what it's worth, I have exactly the same thing happen with 2.2.5: an
> attempt to make any connection to a specific location from any machine on
> my intranet fails for about two minutes, and netstat shows no
> problems. After two minutes, everything is fine. During that two-minute
> period, almost every other request goes through without
> problems. (Sometimes I'll have two or even three locations that are hosed,
> but they appear to be hosed independently because the first to freeze is
> the first to thaw, and so forth.)

Yes. I spent that time calling my ISP the first few times it happened. The
time period was somewhat more than 2 minutes for me because I probably have
less overall traffic than you, and because my ISP couldn't be bothered to
pickup the phone to confirm if they were having problems within a 10 minute
period, even if they tried :)

 
> The really bad news is that it's an intermittent problem, with absolutely
> NO predictability, and by the time I can bring enough tools to bear on the
> problem, it's "cured." It happens when the clients in use are Linux,
> Windows, or Macintosh. It can happen in any hour of the day.

I think it's some obscure bug that's being tickled. Outgoing TCP
connections timeout as if the ACK for the SYN is not being acknowledged.
After a number of connections (the trick I use is to telnet cnn.com 80 & and
then just use up+enter rapidly until I see a connection, it usually takes
about 20-30 outgoing attempts before they start succeeding again). I really
wish I knew more about this. I upgraded my system's kernel from 2.2.14 to a
2.2.17pre when Alan mentioned that a proper fix for a TCP problem under load
had been entered into the tree. No luck. I think this is an obscure bug in
the 2.2 series.. but I'm not going to put a 2.4 test kernel onto my
firewall, as it's also my webserver, mail server, etc. I can try to
reproduce the condition on an internal machine, though.

One thing is for sure. The frequency of stalls seems to vary directly with
the number of rules in the input chain.
 
> The good news is that it happens so seldom that I haven't found it
> necessary to worry about spending a lot of time on it. It happens once
> every three-four days.

For me, it happened once every month or so. Then the frequency increased to
every few weeks or so as I added another set of rules to the chain and
flushed/reloaded it. This was also very frustrating as only TCP affected.
I can ping/traceroute just fine (ICMP) to the sites I'm trying to connect
to. But all TCP connections seem to timeout :-(

--
    www.kuro5hin.org -- technology and culture, from the trenches.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Aug 31 2000 - 21:00:15 EST