Re: [PATCH] ipv4: mitigate an integer underflow when comparing tcptimestamps
From: Eric Dumazet
Date: Sun Nov 14 2010 - 03:52:39 EST
Le dimanche 14 novembre 2010 Ã 15:35 +0800, Zhang Le a Ãcrit :
> Behind a loadbalancer which does NAT, peer->tcp_ts could be much smaller than
> req->ts_recent. In this case, theoretically the req should not be ignored.
>
> But in fact, it could be ignored, if peer->tcp_ts is so small that the
> difference between this two number is larger than 2 to the power of 31.
>
> I understand that under this situation, timestamp does not make sense any more,
> because it actually comes from difference machines. However, if anyone
> ever need to do the same investigation which I have done, this will
> save some time for him.
>
> Signed-off-by: Zhang Le <r0bertz@xxxxxxxxxx>
> ---
> net/ipv4/tcp_ipv4.c | 4 ++--
> 1 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index 8f8527d..1eb4974 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -1352,8 +1352,8 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
> peer->v4daddr == saddr) {
> inet_peer_refcheck(peer);
> if ((u32)get_seconds() - peer->tcp_ts_stamp < TCP_PAWS_MSL &&
> - (s32)(peer->tcp_ts - req->ts_recent) >
> - TCP_PAWS_WINDOW) {
> + ((s32)(peer->tcp_ts - req->ts_recent) > TCP_PAWS_WINDOW &&
> + peer->tcp_ts > req->ts_recent)) {
> NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_PAWSPASSIVEREJECTED);
> goto drop_and_release;
> }
This seems very wrong to me.
Adding a : if (peer->tcp_ts > req->ts_recent) condition is _not_ going
to help. And it might break some working setups, because of wrap around.
Really, if you have multiple clients behind a common NAT, you cannot use
this code at all, since NAT doesnt usually change TCP timestamps.
What about following patch instead ?
[PATCH] doc: extend tcp_tw_recycle documentation
tcp_tw_recycle should not be used on a server if there is a chance
clients are behind a same NAT. Document this fact before too many users
discover this too late.
Signed-off-by: Eric Dumazet <eric.dumazet@xxxxxxxxx>
---
Documentation/networking/ip-sysctl.txt | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index c7165f4..406f0d5 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -446,7 +446,12 @@ tcp_tso_win_divisor - INTEGER
tcp_tw_recycle - BOOLEAN
Enable fast recycling TIME-WAIT sockets. Default value is 0.
It should not be changed without advice/request of technical
- experts.
+ experts. If you set it to 1, make sure you dont miss connections
+ attempts (check LINUX_MIB_PAWSPASSIVEREJECTED netstat counter).
+ In particular, this might break if several clients are behind
+ a common NAT device, since their TCP timestamp wont be changed
+ by the NAT. tcp_tw_recycle should be used with care, most
+ probably in private networks.
tcp_tw_reuse - BOOLEAN
Allow to reuse TIME-WAIT sockets for new connections when it is
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/