Re: [PATCH net-next 0/7] tcp: restrict rcv_wnd and window_clamp to representable window

From: Simon Baatz

Date: Thu Apr 09 2026 - 17:25:35 EST


Hi Eric,

On Thu, Apr 09, 2026 at 07:52:03AM -0700, Eric Dumazet wrote:
> On Wed, Apr 8, 2026 at 2:50???PM Simon Baatz via B4 Relay
> <devnull+gmbnomis.gmail.com@xxxxxxxxxx> wrote:
> >
> > Hi,
> >
> > this series ensures that rcv_wnd and window_clamp do not exceed the
> > maximum window size representable for the connection's window scale
> > factor.
> >
> > This is most visible when TCP window scaling is not used for a
> > connection. In that case, the advertised window is limited to 65535
> > bytes, but rcv_wnd or window_clamp can still grow beyond 65535 when
> > large receive buffers are used. The resulting mismatch breaks
> > calculations that depend on the advertised window, such as the ACK
> > decision in __tcp_ack_snd_check(), and can prevent immediate ACKs.
> >
> > Similar effects may also occur when window scaling is in use, e.g. if
> > the application dynamically adjusts SO_RCVBUF in unusual ways or when
> > the rmem sysctl parameters change during a connection???s lifetime.
> >
> > Summary:
> >
> > - Patch 1 keeps rcv_wnd capped by the (window scale-limited)
> > window_clamp at connection start.
> > - Patch 3 and 6 ensure that window_clamp is limited to the
> > representable window when it is updated.
> > - The other patches add packetdrill tests to verify the new behavior.
> >
> > A simple iperf test on a virtme-ng VM (Intel i5-7500, 4 cores,
> > loopback) shows a noticeable improvement with window scaling disabled:
>
> Explain why we should spend time reviewing patches trying to help
> stacks from 2 decades ago,
> risking breaking other usages.
>
> Almost every time we change the rcvbuf logic, we introduce bugs.

As soon as someone gives me access to a link with a bandwidth delay
product of probably > 500 MB I am happy to provide another set of
benchmarks results:

`./defaults.sh
sysctl -q net.ipv4.tcp_rmem="4096 2147483647 2147483647"`

0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0

+0 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
+0 > S. 0:0(0) ack 1 win 65535 <mss 1460,nop,nop,sackOK,nop,wscale 14>
+0 < . 1:1(0) ack 1 win 32792

+0 accept(3, ..., ...) = 4

+0 getsockopt(4, IPPROTO_TCP, 10, [1073725440], [4]) = 0
+0 < P. 1:65001(65000) ack 1 win 32792
+0 > . 1:1(0) ack 65001 win 65535
+0 < P. 65001:130001(65000) ack 1 win 32792
+0 > . 1:1(0) ack 130001 win 65535
+0 < P. 130001:195001(65000) ack 1 win 32792
+0 > . 1:1(0) ack 195001 win 65535
+0 < P. 195001:260001(65000) ack 1 win 32792
+0 > . 1:1(0) ack 260001 win 65535
+0 < P. 260001:325001(65000) ack 1 win 32792
+0 > . 1:1(0) ack 325001 win 65535
+0 < P. 325001:390001(65000) ack 1 win 32792
+0 > . 1:1(0) ack 390001 win 65535
+0 getsockopt(4, IPPROTO_TCP, 10, [2113929215], [4]) = 0
+.1 %{ assert tcpi_rcv_wnd <= 1073725440, tcpi_rcv_wnd }%

Fails with:

AssertionError: 1074511872

on a current kernel.

So, I think we should spend time reviewing this because currently we
just pretend to clamp the window at its limits.

> Not using window scaling in 2026 and expecting "iperf improvement" is
> quite something!

I wondered if providing these numbers was a good idea and apparently
it wasn't. I just found the difference to be striking. The only
thing I wanted to demonstrate is that basing our calculations on
bogus window sizes can have real effects.

> Out of curiosity, which legacy product is stuck in the 20th century?

I have half a dozen of these products "stuck in the 20th century" at
home. They are called IoT devices and I find saying that TCP
connections to such devices need not to have proper sequence number
acceptability tests according to RFC 9293 quite something. ;-)

- Simon

--
Simon Baatz <gmbnomis@xxxxxxxxx>