Re: connect() bug?

Ricky Beam (root@defiant.interpath.net)
Sat, 13 Sep 1997 13:53:05 -0400 (EDT)


Letting the chips fall where they may, I quote James Mastros:
>Squid's config file has the comment:
># Some systems (notably Linux) can not be relied upon to properly
># time out connect(2) requests.
>
>Does anybody know if this is still correct? I find no mention of this bug
>on the connect(2) manpage, and I can't find connect in the kernel source at
>all!

Oh, hell yes it is... I've been watching several Linux machines exhibit
connects that don't connect and never timeout for weeks now. The Bovine
rc5 client seems to be very likely to step into it...

Sep 13 10:55:30 idemo1 kernel: tcp_input.c: tcp_rcv_state_process(): [0039B704]
199.72.252.1:1171 -> 207.158.192.53:2056 Syn: Ack+Rst [discard]
Sep 13 10:55:33 idemo1 kernel: tcp_input.c: tcp_rcv_state_process(): [0039B841]
199.72.252.1:1172 -> 207.88.23.246:2056 Syn: Ack+Rst [discard]

It was pointed out some time ago, that the linux network stack does not handle
every situation that can occur (and is outlined in one or more RFC's) during
the opening phase of a connection. Linux always wants to do:
(linux) Syn Sent ->
<- Syn + Ack (Remote)
(linux) Ack ->
[Socket Now Open]

It doesn't always happen like that, and it doesn't _have to_ happen like that
either. The remote side can simply Ack the Linux Syn packet and the connection
is then half open, then send it's own Syn. Linux does not handle that. And
as I read the comments/source, the kernel thinks that's never going to happen.

I went through the code where it wants to discard or reset, and forced a
connection failure at that point to prevent processes getting hung. The
kernel networking code is ok and otherwise usable, but somewhere in there,
the retransmit timers are being cleared, expire, _something_ and thus there
is no retransmit, and thus no connection timeout. It just sets there for
_days_ in SYN_SENT state until somebody kills or otherwise interrupts the
process. With my little bit of evil in the kernel, Bovine has not gotten
hung in a week.

This just in:
[Kernel]
Sep 13 13:43:09 idemo1 kernel: tcp_timer.c: tcp_retransmit_timer: [00491055]
199.72.252.1:1181 -> 205.149.163.212:2056 Syn Tm: 450
Sep 13 13:43:09 idemo1 kernel: tcp_input.c: tcp_rcv_state_process(): [0049105F]
199.72.252.1:1181 -> 205.149.163.212:2056 Syn: Ack-Syn [reset]
Sep 13 13:43:15 idemo1 kernel: tcp_timer.c: tcp_retransmit_timer: [004912BA]
199.72.252.1:1235 -> 134.173.46.146:2056 Syn Tm: 450
Sep 13 13:43:19 idemo1 kernel: tcp_timer.c: tcp_retransmit_timer: [0049147C]
199.72.252.1:1235 -> 134.173.46.146:2056 Syn Tm: 675
Sep 13 13:43:26 idemo1 kernel: tcp_timer.c: tcp_retransmit_timer: [0049171F]
199.72.252.1:1235 -> 134.173.46.146:2056 Syn Tm: 750
Sep 13 13:43:26 idemo1 kernel: tcp_input.c: tcp_rcv_state_process(): [00491743]
199.72.252.1:1235 -> 134.173.46.146:2056 Syn: Ack-Syn [reset]
[Bovine:proc2]
[09/13/97 17:43:06 GMT] Completed block 40F273:20000000 (268435456 keys)
00:16:46.87 - [266603.00 keys/sec]
Network::Open Error - Sleeping for 3 seconds
Network::Open Error - Sleeping for 3 seconds
The proxy says: "Bruteforce is the key! (bruteforce.st.hmc.edu)"
>>>>Network::Error Read Failed 27/0
[09/13/97 17:44:37 GMT] Block: 40F272:00000000 being processed
...

[And for those looking at that and wondering about the timeout times, yes, I
changed the progression, and I'm likely to make it even more agressive.]

--Ricky