Problems with TCP timestamps and SACK

Savochkin Andrey Vladimirovich (saw@msu.ru)
Fri, 10 Apr 1998 19:49:53 +0400


--zYM0uCDKw75PZbzx
Content-Type: text/plain; charset=us-ascii

Hi,

I've found and fixed two problems with new TCP features.

The first problem concerns rtt evaluation using timestamps.
The symptoms of the problem is that the response time on ssh connections
even on a local network could rises to few dozen seconds.
The problem is explained by an incorrect rtt calculation leading
to an unreasonable large retransmit timeout.

The corresponding tcpdump output is attached.

The origin of the problem is
a missing case in the rtt evaluation code: interpacket delays because of absence
of a data shouldn't be accounted for rtt calculations.

The second problem is a bug clearing SACK queue as shown by tcpdump:

16:18:03.734915 193.232.112.103.1069 > castle.nmd.msu.ru.22: . ack 4106046548 win 31856 <nop,nop,timestamp 9392460 22689,nop,nop,sack 4106046568..4106046588> (DF) [tos 0x10]

16:18:03.846247 193.232.112.103.1069 > castle.nmd.msu.ru.22: P 4000765003:4000765023(20) ack 4106046548 win 31856 <nop,nop,timestamp 9392472 22689,nop,nop,sack 4106046568..4106046588> (DF) [tos 0x10]

16:18:03.851996 castle.nmd.msu.ru.22 > 193.232.112.103.1069: P 4106046548:4106046588(40) ack 4000765023 win 31856 <nop,nop,timestamp 22701 9392472> (DF) [tos 0x10]

16:18:03.853472 193.232.112.103.1069 > castle.nmd.msu.ru.22: . ack 4106046588 win 31856 <nop,nop,timestamp 9392472 22701,nop,nop,sack 4106046568..4106046588> (DF) [tos 0x10]

(buggy SACKs in the last line).

The patch fixing both problems is attached.

Regards
Andrey V.
Savochkin

--zYM0uCDKw75PZbzx
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename=tcpdump-res

10:43:14.803570 castle.nmd.msu.ru.22 > 193.232.112.103.1039: P 320461:320481(20) ack 42520 win 31856 <nop,nop,timestamp 5625790 7383859> (DF) [tos 0x10]
10:43:14.911488 193.232.112.103.1039 > castle.nmd.msu.ru.22: P 42520:42540(20) ack 320461 win 31800 <nop,nop,timestamp 7383872 5625772> (DF) [tos 0x10]
10:43:14.923421 castle.nmd.msu.ru.22 > 193.232.112.103.1039: . ack 42540 win 31856 <nop,nop,timestamp 5625802 7383872> (DF) [tos 0x10]
10:43:14.933543 castle.nmd.msu.ru.22 > 193.232.112.103.1039: P 320481:320501(20) ack 42540 win 31856 <nop,nop,timestamp 5625803 7383872> (DF) [tos 0x10]
10:43:14.934224 193.232.112.103.1039 > castle.nmd.msu.ru.22: . ack 320461 win 31800 <nop,nop,timestamp 7383874 5625803,nop,nop,sack 320481..320501> (DF) [tos 0x10]
10:43:15.031545 193.232.112.103.1039 > castle.nmd.msu.ru.22: P 42540:42560(20) ack 320461 win 31800 <nop,nop,timestamp 7383884 5625803,nop,nop,sack 320481..320501> (DF) [tos 0x10]
10:43:15.043421 castle.nmd.msu.ru.22 > 193.232.112.103.1039: . ack 42560 win 31856 <nop,nop,timestamp 5625814 7383884> (DF) [tos 0x10]
10:43:15.053543 castle.nmd.msu.ru.22 > 193.232.112.103.1039: P 320501:320521(20) ack 42560 win 31856 <nop,nop,timestamp 5625815 7383884> (DF) [tos 0x10]
10:43:15.054222 193.232.112.103.1039 > castle.nmd.msu.ru.22: . ack 320461 win 31800 <nop,nop,timestamp 7383886 5625815,nop,nop,sack 320481..320521> (DF) [tos 0x10]
10:43:19.443149 193.232.112.103.1039 > castle.nmd.msu.ru.22: P 42560:42596(36) ack 320461 win 31800 <nop,nop,timestamp 7384325 5625815,nop,nop,sack 320481..320521> (DF) [tos 0x10]
10:43:19.463155 castle.nmd.msu.ru.22 > 193.232.112.103.1039: . ack 42596 win 31856 <nop,nop,timestamp 5626256 7384325> (DF) [tos 0x10]
10:43:19.473292 castle.nmd.msu.ru.22 > 193.232.112.103.1039: P 320521:320557(36) ack 42596 win 31856 <nop,nop,timestamp 5626257 7384325> (DF) [tos 0x10]
10:43:19.474008 193.232.112.103.1039 > castle.nmd.msu.ru.22: . ack 320461 win 31800 <nop,nop,timestamp 7384328 5626257,nop,nop,sack 320481..320557> (DF) [tos 0x10]
10:43:20.042395 193.232.112.103.1039 > castle.nmd.msu.ru.22: P 42596:42616(20) ack 320461 win 31800 <nop,nop,timestamp 7384385 5626257,nop,nop,sack 320481..320557> (DF) [tos 0x10]
10:43:20.053112 castle.nmd.msu.ru.22 > 193.232.112.103.1039: . ack 42616 win 31856 <nop,nop,timestamp 5626315 7384385> (DF) [tos 0x10]
10:43:20.083398 castle.nmd.msu.ru.22 > 193.232.112.103.1039: P 320557:320793(236) ack 42616 win 31856 <nop,nop,timestamp 5626318 7384385> (DF) [tos 0x10]
10:43:20.084521 193.232.112.103.1039 > castle.nmd.msu.ru.22: . ack 320461 win 31800 <nop,nop,timestamp 7384389 5626318,nop,nop,sack 320481..320793> (DF) [tos 0x10]
10:43:32.574437 193.232.112.103.1039 > castle.nmd.msu.ru.22: P 42616:42636(20) ack 320461 win 31800 <nop,nop,timestamp 7385638 5626318,nop,nop,sack 320481..320793> (DF) [tos 0x10]
10:43:32.592297 castle.nmd.msu.ru.22 > 193.232.112.103.1039: . ack 42636 win 31856 <nop,nop,timestamp 5627569 7385638> (DF) [tos 0x10]
10:43:32.602444 castle.nmd.msu.ru.22 > 193.232.112.103.1039: P 320793:320837(44) ack 42636 win 31856 <nop,nop,timestamp 5627570 7385638> (DF) [tos 0x10]
10:43:32.603176 193.232.112.103.1039 > castle.nmd.msu.ru.22: . ack 320461 win 31800 <nop,nop,timestamp 7385640 5627570,nop,nop,sack 320481..320837> (DF) [tos 0x10]
10:44:51.957041 castle.nmd.msu.ru.22 > 193.232.112.103.1039: P 320461:320501(40) ack 42636 win 31856 <nop,nop,timestamp 5635506 7385640> (DF) [tos 0x10]
10:46:02.952392 castle.nmd.msu.ru.22 > 193.232.112.103.1039: P 320461:320521(60) ack 42636 win 31856 <nop,nop,timestamp 5642606 7385640> (DF) [tos 0x10]
10:46:02.968267 193.232.112.103.1039 > castle.nmd.msu.ru.22: . ack 320837 win 31800 <nop,nop,timestamp 7400675 5642606> (DF) [tos 0x10]

--zYM0uCDKw75PZbzx
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="tcp1.patch"

diff -ru linux.orig/net/ipv4/tcp_input.c linux/net/ipv4/tcp_input.c
--- linux.orig/net/ipv4/tcp_input.c Wed Apr 1 12:28:26 1998
+++ linux/net/ipv4/tcp_input.c Fri Apr 10 18:06:29 1998
@@ -667,7 +667,21 @@
static void tcp_ack_saw_tstamp(struct sock *sk, struct tcp_opt *tp,
u32 seq, u32 ack, int flag)
{
- __u32 seq_rtt = (jiffies-tp->rcv_tsecr);
+ __u32 seq_rtt;
+
+ /*
+ RTTM Rule: A TSecr value received in a segment is used to
+ update the averaged RTT measurement only if the segment
+ acknowledges some new data, i.e., only if it advances the
+ left edge of the send window.
+
+ See draft-ietf-tcplw-high-performance-00, section 3.3.
+ 1998/04/10 Andrey V. Savochkin <saw@msu.ru>
+ */
+ if (!(flag & FLAG_DATA_ACKED))
+ return;
+
+ seq_rtt = jiffies-tp->rcv_tsecr;
tcp_rtt_estimator(tp, seq_rtt);
if (tp->retransmits) {
if (tp->packets_out == 0) {
@@ -683,8 +697,7 @@
}
} else {
tcp_set_rto(tp);
- if (flag & FLAG_DATA_ACKED)
- tcp_cong_avoid(tp, seq, ack, seq_rtt);
+ tcp_cong_avoid(tp, seq, ack, seq_rtt);
}
/* NOTE: safe here so long as cong_ctl doesn't use rto */
tcp_bound_rto(tp);
@@ -1224,7 +1237,8 @@
* from the front of a SACK.
*/
for(this_sack = 0; this_sack < num_sacks; this_sack++, sp++) {
- if(!after(sp->start_seq, TCP_SKB_CB(skb)->seq) &&
+ /* check if the start of the sack is covered by skb */
+ if(!before(sp->start_seq, TCP_SKB_CB(skb)->seq) &&
before(sp->start_seq, TCP_SKB_CB(skb)->end_seq))
break;
}

--zYM0uCDKw75PZbzx--

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu