Connection timeout problem - keepalives or USER_TIMEOUT not working.

From: NeilBrown
Date: Sun Sep 28 2014 - 20:45:36 EST

Suppose I have a TCP connection to a remote machine and have configured TCP
keep-alive and the new TCP_USER_TIMEOUT so that the connection should close
after a few minutes of not being able to contact the server.

Suppose further that I change my local IP address and then write to the TCP

What should happen?

My first thought was that the keep-alive mechanism would not see a reply and
would close the connection.
TCP_KEEPIDLE and TCP_KEEPINTVL are both 60 seconds, KEEPCNT is 3.
So 3/4 minute should be all I have to wait ... but no.

Due to

/* It is alive without keepalive 8) */
if (tp->packets_out || tcp_send_head(sk))
goto resched;

in tcp_keepalive_timer(), and due to tcp_send_head(sk) being non-NULL,
no keep-alives are sent (which is reasonable) and we never check if packets
have been received recently (which I'm less sure is reasonable).
I tried the patch below and it made a difference, but not quite the
difference I wanted ... I'll get back to that.

Then I found the TCP_USER_TIMEOUT socket opt. That seemed to be exactly what
I wanted. After all keep-alive is for keeping the connection alive when
nothing is being written. I want to make it die even when something is..

So I tried setting TCP_USER_TIMEOUT to 60*3 as milliseconds, so 180,000.
No luck.

There are two places where icsk_user_timeout is considered.
One is in the keep-alive processing which, as discussed above, is disabled
when then it a packet on the way out.
The other is in tcp_write_timeout from tcp_retransmit_timer(). I don't know
why that isn't being called .. maybe the packet isn't being transmitted
because there is no local interface associated with that flow any more???
(tcp_write_wakeup returns -113: EHOSTUNREACH)

The connection eventually breaks after about 20 minutes thanks, I think, to

Is there a bug here? Where is it? How can I get the connection to break?

This was discovered by mounting an NFSv3 filesystem (via tcp, the default),
changing the local IP address, and writing to an open file.
NFS sets the keep-alive to match the timeo and retrans option.
I added code to set TCP_USER_TIMEOUT as well.
+ unsigned int keeptotal =
+ jiffies_to_msecs(xprt->timeout->to_initval)
+ * keepcnt;
+ kernel_setsockopt(sock, SOL_TCP, TCP_USER_TIMEOUT,
+ (char *)&keeptotal, sizeof(keeptotal));

Now I was going to tell you why this patch didn't do what I wanted.
With this patch, we don't bother sending a keep-alive if there are outstanding packets,
but it still triggers a timeout after the appropriate number of probes.
However icsk_probes_out increments much more quickly, thanks to tcp_send_probe0().
So this makes the timeout happen too soon. I guess the confirms that the keep-alive timeout
should be considered irrelevant when there are pending outgoing packets??

Ahh.. Just had another thought. I've added another patch at the end which might
be a bit closer to the "right" approach. Maybe.


diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index df90cd1ce37f..51e8a89c7619 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -597,12 +597,6 @@ static void tcp_keepalive_timer (unsigned long data)
if (!sock_flag(sk, SOCK_KEEPOPEN) || sk->sk_state == TCP_CLOSE)
goto out;

- elapsed = keepalive_time_when(tp);
- /* It is alive without keepalive 8) */
- if (tp->packets_out || tcp_send_head(sk))
- goto resched;
elapsed = keepalive_time_elapsed(tp);

if (elapsed >= keepalive_time_when(tp)) {
@@ -618,7 +612,9 @@ static void tcp_keepalive_timer (unsigned long data)
goto out;
- if (tcp_write_wakeup(sk) <= 0) {
+ if (tp->packets_out == 0 &&
+ tcp_send_head(sk) == NULL &&
+ tcp_write_wakeup(sk) <= 0) {
elapsed = keepalive_intvl_when(tp);
} else {
@@ -634,7 +630,6 @@ static void tcp_keepalive_timer (unsigned long data)


inet_csk_reset_keepalive_timer (sk, elapsed);
goto out;

Other patch to extend effect of USER_TIMEOUT

diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index df90cd1ce37f..3158cc115a60 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -303,7 +303,8 @@ static void tcp_probe_timer(struct sock *sk)

- if (icsk->icsk_probes_out > max_probes) {
+ if (icsk->icsk_probes_out > max_probes ||
+ (icsk->icsk_user_timeout && icsk->icsk_user_timeout < keepalive_time_elapsed(tp))) {
} else {
/* Only send another probe if we didn't close things up. */

Attachment: pgpCtzrU3MxQL.pgp
Description: OpenPGP digital signature