Re: [PATCH] ipv4: mitigate an integer underflow when comparing tcptimestamps

From: Zhang Le
Date: Sun Nov 14 2010 - 09:59:07 EST


On 09:52 Sun 14 Nov , Eric Dumazet wrote:
> Le dimanche 14 novembre 2010 à 15:35 +0800, Zhang Le a écrit :
> > Behind a loadbalancer which does NAT, peer->tcp_ts could be much smaller than
> > req->ts_recent. In this case, theoretically the req should not be ignored.
> >
> > But in fact, it could be ignored, if peer->tcp_ts is so small that the
> > difference between this two number is larger than 2 to the power of 31.
> >
> > I understand that under this situation, timestamp does not make sense any more,
> > because it actually comes from difference machines. However, if anyone
> > ever need to do the same investigation which I have done, this will
> > save some time for him.
> >
> > Signed-off-by: Zhang Le <r0bertz@xxxxxxxxxx>
> > ---
> > net/ipv4/tcp_ipv4.c | 4 ++--
> > 1 files changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> > index 8f8527d..1eb4974 100644
> > --- a/net/ipv4/tcp_ipv4.c
> > +++ b/net/ipv4/tcp_ipv4.c
> > @@ -1352,8 +1352,8 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
> > peer->v4daddr == saddr) {
> > inet_peer_refcheck(peer);
> > if ((u32)get_seconds() - peer->tcp_ts_stamp < TCP_PAWS_MSL &&
> > - (s32)(peer->tcp_ts - req->ts_recent) >
> > - TCP_PAWS_WINDOW) {
> > + ((s32)(peer->tcp_ts - req->ts_recent) > TCP_PAWS_WINDOW &&
> > + peer->tcp_ts > req->ts_recent)) {
> > NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_PAWSPASSIVEREJECTED);
> > goto drop_and_release;
> > }
>
> This seems very wrong to me.
>
> Adding a : if (peer->tcp_ts > req->ts_recent) condition is _not_ going
> to help. And it might break some working setups, because of wrap around.

Yeah, you are right. And sorry for overlooking this.

I should have reviewed time_{before,after}'s implementation before posting this.

So it seems we can't do anything to improve this except to add some warning in
documentation. Maybe some comments in the code too.

>
> Really, if you have multiple clients behind a common NAT, you cannot use
> this code at all, since NAT doesnt usually change TCP timestamps.
>
> What about following patch instead ?
>
> [PATCH] doc: extend tcp_tw_recycle documentation
>
> tcp_tw_recycle should not be used on a server if there is a chance
> clients are behind a same NAT. Document this fact before too many users
> discover this too late.
>
> Signed-off-by: Eric Dumazet <eric.dumazet@xxxxxxxxx>
> ---
> Documentation/networking/ip-sysctl.txt | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
> index c7165f4..406f0d5 100644
> --- a/Documentation/networking/ip-sysctl.txt
> +++ b/Documentation/networking/ip-sysctl.txt
> @@ -446,7 +446,12 @@ tcp_tso_win_divisor - INTEGER
> tcp_tw_recycle - BOOLEAN
> Enable fast recycling TIME-WAIT sockets. Default value is 0.
> It should not be changed without advice/request of technical
> - experts.
> + experts. If you set it to 1, make sure you dont miss connections
> + attempts (check LINUX_MIB_PAWSPASSIVEREJECTED netstat counter).
> + In particular, this might break if several clients are behind
> + a common NAT device, since their TCP timestamp wont be changed
> + by the NAT. tcp_tw_recycle should be used with care, most
> + probably in private networks.
>
> tcp_tw_reuse - BOOLEAN
> Allow to reuse TIME-WAIT sockets for new connections when it is
>
>

--
Zhang, Le
Gentoo/Loongson Developer
http://zhangle.is-a-geek.org
0260 C902 B8F8 6506 6586 2B90 BC51 C808 1E4E 2973

Attachment: pgp00000.pgp
Description: PGP signature