Re: TCP BUG ??? (patch)

jamal (hadi@cyberus.ca)
Sat, 18 Apr 1998 07:24:56 -0400 (EDT)


Ok finaly resolved. It is not a big deal really, but it had to be nailed
and fixed if necessary.

Full details (patience required -- long and verbose!)

Thanks to Alan Cox mainly, and Dave Miller for some insights.

Executive summary of the problem:
--------------------------------

1) Linux doesnt appropriately handle receiving all bad tcp
data segments (data arriving at some wrong states are not graciously
handled)

2) Linux doesnt appropriately handle receiving all bad tcp fins
(again these are FINS arriving at the wrong state)

This patch also limits the number of received FIN which forced us to
change state (as a result of the FIN arrival, eg from TCP_ESTABLISHED
to TCP_CLOSE_WAIT states).
Reason: There might be a broken stack out there which send
several FINs and expecting ACKs for them; the patch just allows a number
of FINs to be responded to (currently 3 for no reason) and then starts
ignoring any new incoming ones.

The problem is easily reproducible using netscape. Apparently,
Netscape maintains its ftp control channel in case you go back
there to download more stuff.

The details with tcpdump traces:
-------------------------------

We are at this point in the ESTABLISHED state at this point for the
remaining channel/port (of the two that ftp uses) as shown by netstat -t
i.e
[inetfw.sonycsl.co.jp.ftp <--data exchange--> pc-14520.1029]

[it seems we stay there for about 15 minutes with no data exchange]

The remote ftp server sends "you've been idle too long"
message (issues a data segment then a FIN -- forcing us to go into
close_wait after ACKing)

inetfw.sonycsl.co.jp.ftp > pc-14520.1029:
P 646917062:646917118(56) ack 503649480 win 52560 (DF) [tos 0x10]
inetfw.sonycsl.co.jp.ftp > pc-14520.1029:
F 56:56(0) ack 1 win 52560 (DF) [tos 0x10]
pc-14520.1029 > inetfw.sonycsl.co.jp.ftp:
. ack 57 win 31744

We are now in the CLOSE_WAIT state at this point as shown by netstat -t
i.e
[inetfw.sonycsl.co.jp.ftp <-- data sent -- pc-14520.1029]

inetfw.sonycsl.co.jp.ftp > pc-14520.1029:
FP 0:56(56) ack 1 win 52560 (DF) [tos 0x10]

********** Both an invalid FIN and Data Yet we respond to it
(below) and queue the data in the recvq as indicated by netstat.

pc-14520.1029 > inetfw.sonycsl.co.jp.ftp:
. ack 57 win 31744

RFC793 says you shouldnt accept data in this state; but should
still process the FIN

---------------
Two weird ACKs follow about two minutes later

inetfw.sonycsl.co.jp.ftp > pc-14520.1029:
. ack 1 win 52560 (DF) [tos 0x10]
pc-14520.1029 > inetfw.sonycsl.co.jp.ftp: . ack 57
win 31744 (DF)
inetfw.sonycsl.co.jp.ftp > pc-14520.1029:
. ack 1 win 52560 (DF) [tos 0x10]

---------
Two minutes later:
Now notice a repetition of these illegal activities as
a bunch of FINs which we ACK and get ACKed for our ACKs
keep looping infinitely; until the netscape client gets
killed.
These are two minutes apart which leads me to believe
that the NEWS_OS is in the TIME_WAIT state; which means
if we keep acking it, it will reset its timer and repeat
what it did last time. Or should it just discard any
incoming packets?

inetfw.sonycsl.co.jp.ftp > pc-14520.1029:
F 56:56(0) ack 1 win 52560 (DF) [tos 0x10]
pc-14520.1029 > inetfw.sonycsl.co.jp.ftp:
. ack 57 win 31744
inetfw.sonycsl.co.jp.ftp > pc-14520.1029:
. ack 1 win 52560 (DF) [tos 0x10]

inetfw.sonycsl.co.jp.ftp > pc-14520.1029:
F 56:56(0) ack 1 win 52560 (DF) [tos 0x10]
10:01:17.936196 pc-14520.1029 > inetfw.sonycsl.co.jp.ftp:
. ack 57 win 31744
10:01:18.326196 inetfw.sonycsl.co.jp.ftp > pc-14520.1029:
. ack 1 win 52560 (DF) [tos 0x10]

10:03:17.956196 inetfw.sonycsl.co.jp.ftp > pc-14520.1029:
F 56:56(0) ack 1 win 52560 (DF) [tos 0x10]
10:03:17.956196 pc-14520.1029 > inetfw.sonycsl.co.jp.ftp:
. ack 57 win 31744
10:03:18.346196 inetfw.sonycsl.co.jp.ftp > pc-14520.1029:
. ack 1 win 52560 (DF) [tos 0x10]

This loop will stay forever as long as netscape is up

When netscape gets killed we respond appropriately
the state transitions are also appropriate;
from CLOSE_WAIT->LAST_ACK->CLOSED

10:27:10.156196 pc-14520.1029 > inetfw.sonycsl.co.jp.ftp:
F 1:1(0) ack 57 win 31744

We got into LAST_ACK

10:27:10.566196 inetfw.sonycsl.co.jp.ftp > pc-14520.1029:
. ack 2 win 52560 (DF) [tos 0x10]

CLOSE the connection smoothly

----------------------------------------------------------
If anyone is interested in reproducing this bug, here's how
(please use this only if it is necessary so the admin at sony
doesnt get upset)

- start up tcpdump host inetfw.sonycsl.co.jp
- using netscape connect to ftp://inetfw.sonycsl.co.jp/pub/Network/
- download a file named arns.tar.Z
When the downloading is done,start up netstat -tc
watch the tcpdump as well as netstat
and you should see the weird behavior explained. Basically wait until the
state in netstat goes from ESTABLISHED to CLOSE_WAIT and leave it for
about 15 minutes after that while still monitoring the behavior in
tcpdump. After that wait for a few more minutes and observe the loop of
every 2 minutes.

This problem is also partially there in the 2.1.* code, i plan on
patching it. It would probably have been much easier to fix if the
processing of the FIN (RFC 793 step 8) had been separated from that of
the data segment (RFC step 7) instead of the tight coupling that is there
now.

The patch(with some debug info); badly diffed:
--------------------------------------------

--- sock.h.original Fri Apr 17 20:57:42 1998
+++ sock.h Fri Apr 17 20:57:35 1998
@@ -24,6 +24,7 @@
* Alan Cox : Eliminate low level recv/recvfrom
* David S. Miller : New socket lookup architecture for ISS.
* Elliot Poger : New field for SO_BINDTODEVICE option.
+ * J Hadi Saliom : New field for counting rcvd FINs
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
@@ -176,6 +177,7 @@
__u32 urg_seq;
__u32 urg_data;
__u32 syn_seq;
+ unsigned short fin_rcvd_cnt; /* count of rcvd fins */
int users; /* user count */
/*
* Not all are volatile, but some are, so we
--- tcp_input.c.original Fri Apr 17 20:57:19 1998
+++ tcp_input.c Fri Apr 17 20:57:11 1998
@@ -36,6 +36,10 @@
* Elliot Poger : Added support for SO_BINDTODEVICE.
* Willy Konynenberg : Transparent proxy adapted to new
* socket hash code.
+ * J Hadi Salim : Glitches in tcp_data/fin/rcv; more
+ * conformance with 793 in reaction
+ * to arrivals of unexpected FINS/data
+ *
*/

#include <linux/config.h>
@@ -43,6 +47,9 @@
#include <linux/random.h>
#include <net/tcp.h>

+/* for debugging */
+#define print_ip(a) printk(KERN_DEBUG "%ld.%ld.%ld.%ld\n",(ntohl(a)>>24)&0xFF,(ntohl(a)>>16)&0xFF,(ntohl(a)>>8)&0xFF,(ntohl(a))&0xFF);
+
/*
* Policy code extracted so it's now separate
*/
@@ -704,6 +711,7 @@
newsk->delay_acks = 1;
newsk->copied_seq = skb->seq+1;
newsk->fin_seq = skb->seq;
+ newsk->fin_rcvd_cnt = 0;
newsk->syn_seq = skb->seq;
newsk->state = TCP_SYN_RECV;
newsk->timeout = 0;
@@ -974,6 +982,7 @@
newsk->delay_acks = 1;
newsk->copied_seq = skb->seq;
newsk->fin_seq = skb->seq-1;
+ newsk->fin_rcvd_cnt = 0;
newsk->syn_seq = skb->seq-1;
newsk->state = TCP_SYN_RECV;
newsk->timeout = 0;
@@ -1746,6 +1755,7 @@
static int tcp_fin(struct sk_buff *skb, struct sock *sk, struct tcphdr *th)
{
sk->fin_seq = skb->end_seq;
+ sk->fin_rcvd_cnt++;

if (!sk->dead)
{
@@ -1756,7 +1766,6 @@
switch(sk->state)
{
case TCP_SYN_RECV:
- case TCP_SYN_SENT:
case TCP_ESTABLISHED:
/*
* move to CLOSE_WAIT, tcp_data() already handled
@@ -1767,13 +1776,6 @@
sk->shutdown = SHUTDOWN_MASK;
break;

- case TCP_CLOSE_WAIT:
- case TCP_CLOSING:
- /*
- * received a retransmission of the FIN, do
- * nothing.
- */
- break;
case TCP_TIME_WAIT:
/*
* received a retransmission of the FIN,
@@ -1821,16 +1823,15 @@
tcp_set_state(sk,TCP_TIME_WAIT);
break;
case TCP_CLOSE:
- /*
- * already in CLOSE
- */
- break;
+ case TCP_SYN_SENT:
+ case TCP_CLOSE_WAIT:
+ case TCP_LAST_ACK:
+ case TCP_CLOSING:
+ case TCP_LISTEN:
default:
- tcp_set_state(sk,TCP_LAST_ACK);
+ /* bad states -- ignore the FIN */
+ return(1);

- /* Start the timers. */
- tcp_reset_msl_timer(sk, TIME_CLOSE, TCP_TIMEWAIT_LEN);
- return(0);
}

return(0);
@@ -1872,7 +1873,12 @@
*/
skb->acked = 1;
if (skb->h.th->fin)
- tcp_fin(skb,sk,skb->h.th);
+ {
+ if (tcp_fin(skb,sk,skb->h.th))
+ sk->delay_acks=1;
+ else /* it is a good FIN */
+ sk->delay_acks=0;
+ }
return skb->end_seq;
}

@@ -1925,10 +1931,10 @@

/*
* Delay the ack if possible. Send ack's to
- * fin frames immediately as there shouldn't be
+ * good fin frames immediately as there shouldn't be
* anything more to come.
*/
- if (!sk->delay_acks || th->fin) {
+ if (!sk->delay_acks ) {
tcp_send_ack(sk);
} else {
/*
@@ -2000,6 +2006,16 @@
struct tcphdr *th;
u32 new_seq, shut_seq;

+/* 3 FINs have already been received on this connection just
+ignore the segment. There is nothing magical about the number
+3; maybe someone can derive a more scientific estimate. This caps
+the number of times FINS arriving in the wrong states
+*/
+ if (sk->fin_rcvd_cnt > 2)
+ {
+ return(1);
+ }
+
th = skb->h.th;
skb_pull(skb,th->doff*4);
skb_trim(skb,len-(th->doff*4));
@@ -2009,7 +2025,10 @@
* low memory discard algorithm
*/

- sk->bytes_rcv += skb->len;
+
+ /* the only flag that hasnt been processed thus far is the
+ * fin flag; if it is a good FIN ack NOW it else dont
+ * */

if (skb->len == 0 && !th->fin)
{
@@ -2019,10 +2038,36 @@
*/
if (!th->ack)
tcp_send_ack(sk);
- kfree_skb(skb, FREE_READ);
- return(0);
+ return(1);
}

+/*
+* Dont process any *received* data if the system is in the
+following states: CLOSE_WAIT, CLOSING, LAST_ACK ,
+and TIME_WAIT state
+We should still be interested in the FIN;
+The RFC is not clear about the following states:
+TCP_SYN_SENT, TCP_LISTEN, TCP_SYN_RECV and TCP_CLOSE
+*/
+
+ if ((sk->state == TCP_LAST_ACK) || (sk->state == TCP_CLOSE_WAIT) || (sk->state == TCP_CLOSING)|| (sk->state == TCP_TIME_WAIT))
+ {
+
+ if (th->fin)
+ {
+ tcp_fin(skb, sk, th);
+ tcp_send_ack(sk);
+
+ if (skb->len) /* bad data if in these states */
+ {
+ printk(KERN_DEBUG "tcp_data: bad data rcvd source=");
+ print_ip(saddr);
+ return(1);
+ }
+ }
+ }
+
+ sk->bytes_rcv += skb->len;

/*
* We no longer have anyone receiving data on this connection.
@@ -2724,6 +2769,7 @@

/*
* Process the encapsulated data
+ * RFC step 7 and 8 tightly coupled here;
*/

if(tcp_data(skb,sk, saddr, len))

---------- Forwarded message ----------
Date: Wed, 8 Apr 1998 17:37:43 -0400 (EDT)
From: jamal <hadi@cyberus.ca>
To: linux-net@vger.rutgers.edu
Cc: linux-kernel@vger.rutgers.edu
Subject: TCP BUG ???

I posted this about a week ago but no-one responded

[stuff deleted]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu