Re: TCP keepalive timer problem

From: Eric Dumazet
Date: Thu Aug 27 2009 - 08:45:15 EST


Please dont top post on these lists, find my answers below

Li_Xin2@xxxxxxx a écrit :
>
> Thanks for your quick reply, let me explain my problem in detail.
>
> Suppose the client side of communication sets the keep alive socket option, connects to
> server, then > we pulls out the network cable of server box. After the connection is idle for TCP_KEEPIDLE

seconds, the first keepalive probe packet is sent, and of course no reply is received.

Just after the first probe packet, the client sends some data. No response is received, and

as you said, the normal retransmission takes place and no further keepalive probe will be sent.
>
> The problem is: application that tries the keepalive mechanism expects communication peer

crash detection within TCP_KEEPIDLE + TCP_KEEPCNT * TCP_KEEPINTVL seconds. Application may set

relative smaller TCP_KEEPIDLE, TCP_KEEPCNT and TCP_KEEPINTVL value so that peer crash can be

detected quickly, for example, 60 seconds. But if the keepalive is intervened with

retransmission, the latter takes higher priority, so that peer crash will be detected after

13 to 30 minutes, which may not be acceptable for some applications.
>
> We tried TCP implementation on Windows XP SP3, the keepalive and retransmission don't intervene.
>


> Regards,
> Xin Li
> EMC Shanghai R&D Centre
> Email: Li_Xin2@xxxxxxx
> Tel: 86 21 6095 1100 x 2257
>
> -----Original Message-----
> From: Eric Dumazet [mailto:eric.dumazet@xxxxxxxxx]
> Sent: 2009年8月25日 21:13
> To: Li, Xin
> Cc: linux-kernel@xxxxxxxxxxxxxxx; Linux Netdev List
> Subject: Re: TCP keepalive timer problem
>
> Li_Xin2@xxxxxxx a écrit :
>> Greetings,
>>
>> I found one problem in Linux TCP keepalive timer processing, after
>> searching on google, I found Daniel Stempel reported the same problem in
>> 2007 (http://lkml.indiana.edu/hypermail/linux/kernel/0702.2/1136.html),
>> but got no answer. So I have to reraise it.
>>
>> Can anyone help answer this two-years long question?
>>
>>
>
> You should explain your problem in detail, since Daniel one was probably different.
>
> He mentioned "(timeout is set to e.g. 30 seconds)" which is kind of nasty, given normal one is 7200
>
> If some packets are in flight, keepalive is not fired at all, since normal
> retransmits should take place (check tcp_retries2 sysctl).
>
> TCP Keepalive is only fired when no trafic occurred for a long time, only if
> SO_KEEPALIVE socket option was enabled by application.
>
> tcp_retries2 (integer; default: 15)
> The maximum number of times a TCP packet is retransmitted in established state
> before giving up. The default value is 15, which corresponds to a duration of
> approximately between 13 to 30 minutes, depending on the retransmission timeout.
> The RFC 1122 specified minimum limit of 100 seconds is typically deemed too short.
>

RFC1122 , section 4.2.3.6 tells :

Keep-alive packets MUST only be sent when no data or acknowledgement packets have
been received for the connection within an interval. This interval MUST be
configurable and MUST default to no less than two hours.

So :

Normal tcp_retries2 settings should make sure connection is reset if packets in flight are not acknowledged way before TCP_KEEPIDLE (>= 7200 seconds)


Now, 7200 seconds might be inappropriate for special needs, and considering
there is no way to change tcp_retries2 for a given socket (only choice being the global
tcp_retries2 setting), I would vote for a change in our stack, to *relax* RFC,
and get smaller keepalive timers if possible.

So when keepalive_timer fires, we should not care of outgoing packets,
only care on tp->rcv_tstamp, timestamp of last received ACK.


diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index b144a26..719f198 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -484,18 +484,13 @@ static void tcp_keepalive_timer (unsigned long data)
}
}
tcp_send_active_reset(sk, GFP_ATOMIC);
- goto death;
+ tcp_done(sk);
+ goto out;
}

if (!sock_flag(sk, SOCK_KEEPOPEN) || sk->sk_state == TCP_CLOSE)
goto out;

- elapsed = keepalive_time_when(tp);
-
- /* It is alive without keepalive 8) */
- if (tp->packets_out || tcp_send_head(sk))
- goto resched;
-
elapsed = tcp_time_stamp - tp->rcv_tstamp;

if (elapsed >= keepalive_time_when(tp)) {
@@ -522,13 +517,7 @@ static void tcp_keepalive_timer (unsigned long data)
TCP_CHECK_TIMER(sk);
sk_mem_reclaim(sk);

-resched:
inet_csk_reset_keepalive_timer (sk, elapsed);
- goto out;
-
-death:
- tcp_done(sk);
-
out:
bh_unlock_sock(sk);
sock_put(sk);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/