Retransmitted packets ignored?
From: Aleksandar Milivojevic
Date: Wed Dec 08 2004 - 14:56:01 EST
Recently I started to experience rather strange problems with one
monitoring appliaction I have (basically, it just checks if remote IMAP
server is alive). After looking into it, it seems to be problem with
Linux TCP stack.
What happens on TCP level is something like this:
- the three-way handshake goes well
- monitoring app sends IMAP logout command (blindly, without waiting
for the banner from IMAP server)
- remote side responds with banner message (OK blah blah)
- remote side sends response to logout command (* BYE ... ) and sets
FIN flag on this packet, closing the connection
Now, sometimes the link to remote site is flaky, and drops the packet or
two. If it happens to the last two packets, I see the remote Linux box
merging previous two packets into one, and retransmitting (again, with
FIN flag set). However, Linux box where monitoring application runs
seems to ignore those retransmitted packets for whatever reason.
I've checked the sequence numbers and acknowledgement numbers on all
packets (basically, followed entire TCP connection packet by packet,
checking everything is sane using pen and paper), and everything seems
OK (well, other than two dropped packets, but those should have been
taken care of by retransmissions).
Both sides are running 2.4 series kernels. Were there any recent fixes
or updates to TCP protocol stack?
This is how it looks in tcpdump from viewpoint of local machine ("a").
Three way handshake goes well:
a.3994 > b.imap: S 1476546745:1476546745(0) win 5840 <mss
1460,sackOK,timestamp 3097547910 0,nop,wscale 0> (DF)
b.imap > a.3994: S 2105003620:2105003620(0) ack 1476546746 win 5792 <mss
1460,sackOK,timestamp 1015114245 3097547910,nop,wscale 0> (DF)
a.3994 > b.imap: . ack 1 win 5840 <nop,nop,timestamp 3097547914
1015114245> (DF)
Logout command is sent (15 bytes):
a.3994 > b.imap: P 1:16(15) ack 1 win 5840 <nop,nop,timestamp 3097547914
1015114245> (DF)
Ack for previous packet:
b.imap > a.3994: . ack 16 win 5792 <nop,nop,timestamp 1015114248
3097547914> (DF)
At this point, two packets get lost in transmission (148 bytes (banner)
and 91 bytes (response to logout command) of data respectively, visible
from tcpdump that was running on remote side), so on local side I see
only retransmissions for them merged into one packet (239 bytes):
b.imap > a.3994: FP 1:240(239) ack 16 win 5792 <nop,nop,timestamp
1015114276 3097547914> (DF)
b.imap > a.3994: FP 1:240(239) ack 16 win 5792 <nop,nop,timestamp
1015114330 3097547914> (DF)
There's couple of more retransmissions (identical to above two) before
machine "a" times out the connection, and sends FIN ACK=1, remote side
responds with ACK=17 (basically, local machine acknowledged transmission
of zero bytes, and remote side of 15 bytes (the logout command)). At
this point monitoring application concludes something is wrong with IMAP
server (it connected, but there was no response) and sends an alarm.
Another interesting thing I noticed was that remote machine ("b") sent
two or three more retransmissions after it acknowledged connection
closing (FIN ACK, ACK). Shouldn't it given up as soon as it got FIN ACK?
--
Aleksandar Milivojevic <amilivojevic@xxxxxx> Pollard Banknote Limited
Systems Administrator 1499 Buffalo Place
Tel: (204) 474-2323 ext 276 Winnipeg, MB R3T 1L7
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/