[RFC 3/3] tcp: Adjust congestion window handling for mtu probe

From: Leonard Crestez
Date: Tue May 11 2021 - 08:04:54 EST


When preparing an mtu probe linux checks for snd_cwnd >= 11 and for 2
more packets to fit alongside what is currently in flight. The reasoning
behind these constants is unclear. Replace this with checks based on the
required probe size:

* Skip probing if congestion window is too small to ever fit a probe.
* Wait for the congestion window to drain if too many packets are
already in flight.

This is very similar to snd_wnd logic except packets are counted instead
of bytes.

This patch will allow mtu probing at smaller cwnd values.

On very fast links after successive succesful MTU probes the cwnd
(measured in packets) shrinks and does not grow again because
tcp_is_cwnd_limited returns false. If snd_cwnd falls below then no more
probes are sent despite the link being otherwise idle.

Signed-off-by: Leonard Crestez <cdleonard@xxxxxxxxx>
---
net/ipv4/tcp_output.c | 17 +++++++++--------
1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 7cd1e8fd9749..ccf3eb29e7a5 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2369,10 +2369,11 @@ static int tcp_mtu_probe(struct sock *sk)
struct tcp_sock *tp = tcp_sk(sk);
struct sk_buff *skb, *nskb, *next;
struct net *net = sock_net(sk);
int probe_size;
int size_needed;
+ int packets_needed;
int copy, len;
int mss_now;
int interval;

/* Not currently probing/verifying,
@@ -2381,20 +2382,20 @@ static int tcp_mtu_probe(struct sock *sk)
* not SACKing (the variable headers throw things off)
*/
if (likely(!icsk->icsk_mtup.enabled ||
icsk->icsk_mtup.probe_size ||
inet_csk(sk)->icsk_ca_state != TCP_CA_Open ||
- tp->snd_cwnd < 11 ||
tp->rx_opt.num_sacks || tp->rx_opt.dsack))
return -1;

/* Use binary search for probe_size between tcp_mss_base,
* and current mss_clamp. if (search_high - search_low)
* smaller than a threshold, backoff from probing.
*/
mss_now = tcp_current_mss(sk);
size_needed = tcp_mtu_probe_size_needed(sk, &probe_size);
+ packets_needed = DIV_ROUND_UP(size_needed, tp->mss_cache);

interval = icsk->icsk_mtup.search_high - icsk->icsk_mtup.search_low;
/* When misfortune happens, we are reprobing actively,
* and then reprobe timer has expired. We stick with current
* probing process by not resetting search range to its orignal.
@@ -2406,26 +2407,26 @@ static int tcp_mtu_probe(struct sock *sk)
*/
tcp_mtu_check_reprobe(sk);
return -1;
}

+ /* Can probe fit inside snd_cwnd */
+ if (packets_needed > tp->snd_cwnd)
+ return -1;
+
/* Have enough data in the send queue to probe? */
if (tp->write_seq - tp->snd_nxt < size_needed)
return net->ipv4.sysctl_tcp_mtu_probe_autocork ? 0 : -1;

if (tp->snd_wnd < size_needed)
return -1;
if (after(tp->snd_nxt + size_needed, tcp_wnd_end(tp)))
return 0;

- /* Do we need to wait to drain cwnd? With none in flight, don't stall */
- if (tcp_packets_in_flight(tp) + 2 > tp->snd_cwnd) {
- if (!tcp_packets_in_flight(tp))
- return -1;
- else
- return 0;
- }
+ /* Wait for snd_cwnd to drain */
+ if (tcp_packets_in_flight(tp) + packets_needed > tp->snd_cwnd)
+ return 0;

if (!tcp_can_coalesce_send_queue_head(sk, probe_size))
return -1;

/* We're allowed to probe. Build it now. */
--
2.25.1