Re: [net-next PATCH] tcp: Add logic to check for SYN w/ data in tcp_simple_retransmit
From: Yuchung Cheng
Date: Sat Dec 12 2020 - 13:36:02 EST
On Fri, Dec 11, 2020 at 5:28 PM Alexander Duyck
<alexander.duyck@xxxxxxxxx> wrote:
>
> From: Alexander Duyck <alexanderduyck@xxxxxx>
>
> There are cases where a fastopen SYN may trigger either a ICMP_TOOBIG
> message in the case of IPv6 or a fragmentation request in the case of
> IPv4. This results in the socket stalling for a second or more as it does
> not respond to the message by retransmitting the SYN frame.
>
> Normally a SYN frame should not be able to trigger a ICMP_TOOBIG or
> ICMP_FRAG_NEEDED however in the case of fastopen we can have a frame that
> makes use of the entire MSS. In the case of fastopen it does, and an
> additional complication is that the retransmit queue doesn't contain the
> original frames. As a result when tcp_simple_retransmit is called and
> walks the list of frames in the queue it may not mark the frames as lost
> because both the SYN and the data packet each individually are smaller than
> the MSS size after the adjustment. This results in the socket being stalled
> until the retransmit timer kicks in and forces the SYN frame out again
> without the data attached.
>
> In order to resolve this we can generate our best estimate for the original
> packet size by detecting the fastopen SYN frame and then adding the
> overhead for MAX_TCP_OPTION_SPACE and verifying if the SYN w/ data would
> have exceeded the MSS. If so we can mark the frame as lost and retransmit
> it.
>
> Signed-off-by: Alexander Duyck <alexanderduyck@xxxxxx>
> ---
> net/ipv4/tcp_input.c | 30 +++++++++++++++++++++++++++---
> 1 file changed, 27 insertions(+), 3 deletions(-)
>
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 9e8a6c1aa019..79375b58de84 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -2686,11 +2686,35 @@ static void tcp_mtup_probe_success(struct sock *sk)
> void tcp_simple_retransmit(struct sock *sk)
> {
> const struct inet_connection_sock *icsk = inet_csk(sk);
> + struct sk_buff *skb = tcp_rtx_queue_head(sk);
> struct tcp_sock *tp = tcp_sk(sk);
> - struct sk_buff *skb;
> - unsigned int mss = tcp_current_mss(sk);
> + unsigned int mss;
> +
> + /* A fastopen SYN request is stored as two separate packets within
> + * the retransmit queue, this is done by tcp_send_syn_data().
> + * As a result simply checking the MSS of the frames in the queue
> + * will not work for the SYN packet. So instead we must make a best
> + * effort attempt by validating the data frame with the mss size
> + * that would be computed now by tcp_send_syn_data and comparing
> + * that against the data frame that would have been included with
> + * the SYN.
> + */
> + if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_SYN && tp->syn_data) {
> + struct sk_buff *syn_data = skb_rb_next(skb);
> +
> + mss = tcp_mtu_to_mss(sk, icsk->icsk_pmtu_cookie) +
> + tp->tcp_header_len - sizeof(struct tcphdr) -
> + MAX_TCP_OPTION_SPACE;
nice comment! The original syn_data mss needs to be inferred which is
a hassle to get right. my sense is path-mtu issue is enough to warrant
they are lost.
I suggest simply mark syn & its data lost if tcp_simple_retransmit is
called during TFO handshake, i.e.
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 62f7aabc7920..7f0c4f2947eb 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2864,7 +2864,8 @@ void tcp_simple_retransmit(struct sock *sk)
unsigned int mss = tcp_current_mss(sk);
skb_rbtree_walk(skb, &sk->tcp_rtx_queue) {
- if (tcp_skb_seglen(skb) > mss)
+ if (tcp_skb_seglen(skb) > mss ||
+ (tp->syn_data && sk->sk_state == TCP_SYN_SENT))
tcp_mark_skb_lost(sk, skb);
}
We have a TFO packetdrill test that verifies my suggested fix should
trigger an immediate retransmit vs 1s wait.
>
> - skb_rbtree_walk(skb, &sk->tcp_rtx_queue) {
> + if (syn_data && syn_data->len > mss)
> + tcp_mark_skb_lost(sk, skb);
> +
> + skb = syn_data;
> + } else {
> + mss = tcp_current_mss(sk);
> + }
> +
> + skb_rbtree_walk_from(skb) {
> if (tcp_skb_seglen(skb) > mss)
> tcp_mark_skb_lost(sk, skb);
> }
>
>