Re: [PATCH net-next] tcp: clamp window like before the cleanup

From: Eric Dumazet
Date: Thu Mar 06 2025 - 05:04:21 EST


On Thu, Mar 6, 2025 at 10:55 AM Matthieu Baerts <matttbe@xxxxxxxxxx> wrote:
>
> Hi Eric,
>
> On 06/03/2025 10:45, Eric Dumazet wrote:
> > On Thu, Mar 6, 2025 at 6:22 AM Jason Xing <kerneljasonxing@xxxxxxxxx> wrote:
> >>
> >> On Wed, Mar 5, 2025 at 10:49 PM Matthieu Baerts (NGI0)
> >> <matttbe@xxxxxxxxxx> wrote:
> >>>
> >>> A recent cleanup changed the behaviour of tcp_set_window_clamp(). This
> >>> looks unintentional, and affects MPTCP selftests, e.g. some tests
> >>> re-establishing a connection after a disconnect are now unstable.
> >>>
> >>> Before the cleanup, this operation was done:
> >>>
> >>> new_rcv_ssthresh = min(tp->rcv_wnd, new_window_clamp);
> >>> tp->rcv_ssthresh = max(new_rcv_ssthresh, tp->rcv_ssthresh);
> >>>
> >>> The cleanup used the 'clamp' macro which takes 3 arguments -- value,
> >>> lowest, and highest -- and returns a value between the lowest and the
> >>> highest allowable values. This then assumes ...
> >>>
> >>> lowest (rcv_ssthresh) <= highest (rcv_wnd)
> >>>
> >>> ... which doesn't seem to be always the case here according to the MPTCP
> >>> selftests, even when running them without MPTCP, but only TCP.
> >>>
> >>> For example, when we have ...
> >>>
> >>> rcv_wnd < rcv_ssthresh < new_rcv_ssthresh
> >>>
> >>> ... before the cleanup, the rcv_ssthresh was not changed, while after
> >>> the cleanup, it is lowered down to rcv_wnd (highest).
> >>>
> >>> During a simple test with TCP, here are the values I observed:
> >>>
> >>> new_window_clamp (val) rcv_ssthresh (lo) rcv_wnd (hi)
> >>> 117760 (out) 65495 < 65536
> >>> 128512 (out) 109595 > 80256 => lo > hi
> >>> 1184975 (out) 328987 < 329088
> >>>
> >>> 113664 (out) 65483 < 65536
> >>> 117760 (out) 110968 < 110976
> >>> 129024 (out) 116527 > 109696 => lo > hi
> >>>
> >>> Here, we can see that it is not that rare to have rcv_ssthresh (lo)
> >>> higher than rcv_wnd (hi), so having a different behaviour when the
> >>> clamp() macro is used, even without MPTCP.
> >>>
> >>> Note: new_window_clamp is always out of range (rcv_ssthresh < rcv_wnd)
> >>> here, which seems to be generally the case in my tests with small
> >>> connections.
> >>>
> >>> I then suggests reverting this part, not to change the behaviour.
> >>>
> >>> Fixes: 863a952eb79a ("tcp: tcp_set_window_clamp() cleanup")
> >>> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/551
> >>> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@xxxxxxxxxx>
> >>
> >> Tested-by: Jason Xing <kerneljasonxing@xxxxxxxxx>
> >>
> >> Thanks for catching this. I should have done more tests :(
> >>
> >> Now I use netperf with TCP_CRR to test loopback and easily see the
> >> case where tp->rcv_ssthresh is larger than tp->rcv_wnd, which means
> >> tp->rcv_wnd is not the upper bound as you said.
> >>
> >> Thanks,
> >> Jason
> >>
> >
> > Patch looks fine to me but all our tests are passing with the current kernel,
> > and I was not able to trigger the condition.
>
> Thank you for having looked at this patch!
>
>
> > Can you share what precise test you did ?
>
> To be able to get a situation where "rcv_ssthresh > rcv_wnd", I simply
> executed MPTCP Connect selftest. You can also force creating TCP only
> connections with '-tt', e.g.
>
> ./mptcp_connect.sh -tt

I was asking Jason about TCP tests. He mentioned TCP_CRR

I made several of them, with temporary debug in the kernel that did
not show the issue.


I am wondering if this could hide an issue in MPTCP ?

>
>
> To be able to reproduce the issue with the selftests mentioned in [1], I
> simply executed ./mptcp_connect.sh in a loop after having applied this
> small patch to execute only a part of the subtests ("disconnect"):
>
> > diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.sh b/tools/testing/selftests/net/mptcp/mptcp_connect.sh
> > index 5e3c56253274..d8ebea5abc6c 100755
> > --- a/tools/testing/selftests/net/mptcp/mptcp_connect.sh
> > +++ b/tools/testing/selftests/net/mptcp/mptcp_connect.sh
> > @@ -855,6 +855,7 @@ make_file "$sin" "server"
> >
> > mptcp_lib_subtests_last_ts_reset
> >
> > +if false; then
> > check_mptcp_disabled
> >
> > stop_if_error "The kernel configuration is not valid for MPTCP"
> > @@ -882,6 +883,7 @@ mptcp_lib_result_code "${ret}" "ping tests"
> >
> > stop_if_error "Could not even run ping tests"
> > mptcp_lib_pr_ok
> > +fi
> >
> > [ -n "$tc_loss" ] && tc -net "$ns2" qdisc add dev ns2eth3 root netem loss random $tc_loss delay ${tc_delay}ms
> > tc_info="loss of $tc_loss "
> > @@ -910,6 +912,7 @@ mptcp_lib_pr_info "Using ${tc_info}on ns3eth4"
> >
> > tc -net "$ns3" qdisc add dev ns3eth4 root netem delay ${reorder_delay}ms $tc_reorder
> >
> > +if false; then
> > TEST_GROUP="loopback v4"
> > run_tests_lo "$ns1" "$ns1" 10.0.1.1 1
> > stop_if_error "Could not even run loopback test"
> > @@ -959,6 +962,7 @@ log_if_error "Tests with MPTFO have failed"
> > run_test_transparent 10.0.3.1 "tproxy ipv4"
> > run_test_transparent dead:beef:3::1 "tproxy ipv6"
> > log_if_error "Tests with tproxy have failed"
> > +fi
> >
> > run_tests_disconnect
> > log_if_error "Tests of the full disconnection have failed"
>
> Note that our CI was able to easily reproduce it. Locally, it was taking
> around 30 to 50 iterations to reproduce the issue.
>
> [1] https://github.com/multipath-tcp/mptcp_net-next/issues/551
>
> Cheers,
> Matt
> --
> Sponsored by the NGI0 Core fund.
>