Re: 2.6.17 regression: Very slow net transfer from some hosts

From: Stephen Hemminger
Date: Tue Apr 11 2006 - 19:59:32 EST


On Wed, 12 Apr 2006 01:06:09 +0100
Daniel Drake <dsd@xxxxxxxxxx> wrote:

> Stephen Hemminger wrote:
> >> This is very familiar, and I just found the article I was thinking of:
> >> http://lwn.net/Articles/92727/
> >>
> >> I was also hit by that bug, on the same collection of websites, but that
> >> particular problem was fixed for 2.6.8 or so. So I guess it is extremely
> >> likely that my ISP has broken routers. nmap isn't able to identify the
> >> OS of any ISP routers in my path.
> >
> > We never fixed it, its kind of hard to fix other peoples equipment ;-)
>
> Weird, things started working for me around 2.6.9 without having to
> modify any sysctl stuff.

What we did was default the window scaling needed to match the max
possible memory usage. The normal value for tcp_wmem correlated to
a window scale of 2. If a corrupting middlebox lost the window
scale option, then connection would proceed but all windows would
be 1/4 of possible; and connection would still limp along at
somewhat reduced bandwidth.

John's changes cause tcp_wmem to be bigger, so we ask for a bigger
window scale. If the "window scale lost in translation" problem gets
too bad, the sender will never send anything because it thinks
the receiver is doing silly-window-syndrome.


>
> > Turn off TCP window scaling, your performance will be limited but about
> > as good as you can get with a corrupting firewall in between.
>
> I was wrong in my previous mail where I said that the rmem/wmem output
> hasn't changed over the two kernels - it has, the 3rd column differs. I
> simply set those values back to what they were on 2.6.16 and now things
> work again - I presumably have window scale 2 (scale factor 4) again,
> which appears to be a decent compromise between having a window and
> things actually working.
>
> For anyone else interested, the ISP is NTL (UK). The fix:
>
> echo "4096 16384 131072 " > /proc/sys/net/ipv4/tcp_wmem
> echo "4096 87380 174760 " > /proc/sys/net/ipv4/tcp_rmem
>
>
> This issue is visible on my 1GB system but not on my laptop (256mb RAM).
> The key thing is that more memory means a higher window scale factor is
> used, which appears to trigger ntl's brokenness.
>
> Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/