TCP: user-setable irtt improperly handled (patch included)

Matthias Welwarsky (dg2fef@dg2fef-2.ampr.org)
Thu, 30 Jan 1997 22:06:33 +0100


This is a multi-part message in MIME format.

--------------1665B48E7436DB21238B90AC
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Hi all.

The current TCP implementation allows a per-route preset of the initial
retransmit timeout (or is it "initial round trip time"?) for all
outgoing TCP connections. However, the support for this feature seems to
be incomplete, because the preset is only used on setting the
RTT-Estimator *once* when initiating a connection. Whenever the
RTT-Estimator is reset within the handling of the protocol, the standard
TCP_TIMEOUT_INIT is used, which is much too fast for slow connections
(i.e. IP via AX.25). This results e.g. in extremly annoying retries
immediately after the connection is established. Additionally, the
preset could/should be used on incoming connections also.

The patch I include should fix this, it is against a clean 2.0.28. Could
someone on this list please test it? It works perfectly for me and as it
changes only a few lines of code it should not introduce new bugs, but
who knows...

The main thing why I'd like to have someone have a look at it is that I
had to add something to "struct sock" in order to make it look clean and
nice, and thus it will affect networking in general (well, just a bit).

--
73s de Matthias

--------------1665B48E7436DB21238B90AC Content-Type: text/plain; charset=us-ascii; name="diff" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="diff"

diff -ru -x .hdepend -x *.[aos] -x *~ -x .depend -x System.map -x *.orig linux-2.0.28/net/ipv4/tcp.c linux-2.0.28a/net/ipv4/tcp.c --- linux-2.0.28/net/ipv4/tcp.c Wed Nov 27 08:44:11 1996 +++ linux-2.0.28a/net/ipv4/tcp.c Mon Jan 27 14:02:21 1997 @@ -2083,10 +2083,16 @@ tcp_cache_zap(); tcp_set_state(sk,TCP_SYN_SENT); - if(rt&&rt->rt_flags&RTF_IRTT) + if(rt&&rt->rt_flags&RTF_IRTT) { sk->rto = rt->rt_irtt; - else + sk->mdev = rt->rt_irtt; + sk->rto_init = rt->rt_irtt; + } else { sk->rto = TCP_TIMEOUT_INIT; + sk->mdev = TCP_TIMEOUT_INIT; + sk->rto_init = TCP_TIMEOUT_INIT; + } + sk->rtt = 0; sk->delack_timer.function = tcp_delack_timer; sk->delack_timer.data = (unsigned long) sk; sk->retransmit_timer.function = tcp_retransmit_timer; diff -ru -x .hdepend -x *.[aos] -x *~ -x .depend -x System.map -x *.orig linux-2.0.28/net/ipv4/tcp_input.c linux-2.0.28a/net/ipv4/tcp_input.c --- linux-2.0.28/net/ipv4/tcp_input.c Sat Nov 30 11:51:03 1996 +++ linux-2.0.28a/net/ipv4/tcp_input.c Mon Jan 27 13:34:39 1997 @@ -446,6 +446,7 @@ newsk->rtt = 0; newsk->rto = TCP_TIMEOUT_INIT; newsk->mdev = TCP_TIMEOUT_INIT; + newsk->rto_init = TCP_TIMEOUT_INIT; newsk->max_window = 0; /* * See draft-stevens-tcpca-spec-01 for discussion of the @@ -549,9 +550,14 @@ if (sk->user_mss) newsk->mtu = sk->user_mss; - else if (rt) + else if (rt) { newsk->mtu = rt->rt_mtu - sizeof(struct iphdr) - sizeof(struct tcphdr); - else + if (rt->rt_irtt) { + newsk->rto = rt->rt_irtt; + newsk->mdev = rt->rt_irtt; + newsk->rto_init = rt->rt_irtt; + } + } else newsk->mtu = 576 - sizeof(struct iphdr) - sizeof(struct tcphdr); /* @@ -728,7 +734,7 @@ * We don't want too many packets out there. */ - if (sk->ip_xmit_timeout == TIME_WRITE && + if (sk->ip_xmit_timeout == TIME_WRITE && sk->cong_window < 2048 && after(ack, sk->rcv_ack_seq)) { @@ -836,7 +842,7 @@ sk->rcv_ack_seq = ack; sk->rcv_ack_cnt = 1; } - + /* * We passed data and got it acked, remove any soft error * log. Something worked... @@ -1178,10 +1184,13 @@ /* Reset the RTT estimator to the initial * state rather than testing to avoid * updating it on the ACK to the SYN packet. + * + * MW: this was sk->rto = TCP_TIMEOUT_INIT, which is + * wrong if we have a valid preset for rt_irtt */ sk->rtt = 0; - sk->rto = TCP_TIMEOUT_INIT; - sk->mdev = TCP_TIMEOUT_INIT; + sk->rto = sk->rto_init; + sk->mdev = sk->rto_init; } /* @@ -2004,8 +2013,8 @@ * updating it on the ACK to the SYN packet. */ sk->rtt = 0; - sk->rto = TCP_TIMEOUT_INIT; - sk->mdev = TCP_TIMEOUT_INIT; + sk->rto = sk->rto_init; + sk->mdev = sk->rto_init; } else { diff -ru -x .hdepend -x *.[aos] -x *~ -x .depend -x System.map -x *.orig linux-2.0.28/net/ipv4/tcp_output.c linux-2.0.28a/net/ipv4/tcp_output.c --- linux-2.0.28/net/ipv4/tcp_output.c Mon Sep 2 14:18:26 1996 +++ linux-2.0.28a/net/ipv4/tcp_output.c Sun Jan 26 14:38:31 1997 @@ -854,7 +854,7 @@ buff->csum = csum_partial(ptr, 4, 0); tcp_send_check(t1, newsk->saddr, newsk->daddr, sizeof(*t1)+4, buff); newsk->prot->queue_xmit(newsk, ndev, buff, 0); - tcp_reset_xmit_timer(newsk, TIME_WRITE , TCP_TIMEOUT_INIT); + tcp_reset_xmit_timer(newsk, TIME_WRITE , newsk->rto); skb->sk = newsk; /* diff -ru -x .hdepend -x *.[aos] -x *~ -x .depend -x System.map -x *.orig linux-2.0.28/include/net/sock.h linux-2.0.28a/include/net/sock.h --- linux-2.0.28/include/net/sock.h Fri Jan 24 11:33:50 1997 +++ linux-2.0.28a/include/net/sock.h Mon Jan 27 12:54:56 1997 @@ -235,6 +235,7 @@ volatile unsigned long rtt; volatile unsigned long mdev; volatile unsigned long rto; + volatile unsigned long rto_init; /* MW: TCP_TIMEOUT_INIT or rt_irtt */ /* * currently backoff isn't used, but I'm maintaining it in case

--------------1665B48E7436DB21238B90AC--