Re: [net PATCH] tcp: Mark fastopen SYN packet as lost when receiving ICMP_TOOBIG/ICMP_FRAG_NEEDED
From: Alexander Duyck
Date: Fri Dec 11 2020 - 16:53:21 EST
On Fri, Dec 11, 2020 at 11:18 AM Eric Dumazet <edumazet@xxxxxxxxxx> wrote:
>
> On Fri, Dec 11, 2020 at 6:15 PM Alexander Duyck
> <alexander.duyck@xxxxxxxxx> wrote:
> >
> > On Fri, Dec 11, 2020 at 8:22 AM Eric Dumazet <edumazet@xxxxxxxxxx> wrote:
> > >
> > > On Fri, Dec 11, 2020 at 5:03 PM Alexander Duyck
> > > <alexander.duyck@xxxxxxxxx> wrote:
> > >
> > > > That's fine. I can target this for net-next. I had just selected net
> > > > since I had considered it a fix, but I suppose it could be considered
> > > > a behavioral change.
> > >
> > > We are very late in the 5.10 cycle, and we never handled ICMP in this
> > > state, so net-next is definitely better.
> > >
> > > Note that RFC 7413 states in 4.1.3 :
> > >
> > > The client MUST cache cookies from servers for later Fast Open
> > > connections. For a multihomed client, the cookies are dependent on
> > > the client and server IP addresses. Hence, the client should cache
> > > at most one (most recently received) cookie per client and server IP
> > > address pair.
> > >
> > > When caching cookies, we recommend that the client also cache the
> > > Maximum Segment Size (MSS) advertised by the server. The client can
> > > cache the MSS advertised by the server in order to determine the
> > > maximum amount of data that the client can fit in the SYN packet in
> > > subsequent TFO connections. Caching the server MSS is useful
> > > because, with Fast Open, a client sends data in the SYN packet before
> > > the server announces its MSS in the SYN-ACK packet. If the client
> > > sends more data in the SYN packet than the server will accept, this
> > > will likely require the client to retransmit some or all of the data.
> > > Hence, caching the server MSS can enhance performance.
> > >
> > > Without a cached server MSS, the amount of data in the SYN packet is
> > > limited to the default MSS of 536 bytes for IPv4 [RFC1122] and 1220
> > > bytes for IPv6 [RFC2460]. Even if the client complies with this
> > > limit when sending the SYN, it is known that an IPv4 receiver
> > > advertising an MSS less than 536 bytes can receive a segment larger
> > > than it is expecting.
> > >
> > > If the cached MSS is larger than the typical size (1460 bytes for
> > > IPv4 or 1440 bytes for IPv6), then the excess data in the SYN packet
> > > may cause problems that offset the performance benefit of Fast Open.
> > > For example, the unusually large SYN may trigger IP fragmentation and
> > > may confuse firewalls or middleboxes, causing SYN retransmission and
> > > other side effects. Therefore, the client MAY limit the cached MSS
> > > to 1460 bytes for IPv4 or 1440 for IPv6.
> > >
> > >
> > > Relying on ICMP is fragile, since they can be filtered in some way.
> >
> > In this case I am not relying on the ICMP, but thought that since I
> > have it I should make use of it. WIthout the ICMP we would still just
> > be waiting on the retransmit timer.
> >
> > The problem case has a v6-in-v6 tunnel between the client and the
> > endpoint so both ends assume an MTU 1500 and advertise a 1440 MSS
> > which works fine until they actually go to send a large packet between
> > the two. At that point the tunnel is triggering an ICMP_TOOBIG and the
> > endpoint is stalling since the MSS is dropped to 1400, but the SYN and
> > data payload were already smaller than that so no retransmits are
> > being triggered. This results in TFO being 1s slower than non-TFO
> > because of the failure to trigger the retransmit for the frame that
> > violated the PMTU. The patch is meant to get the two back into
> > comparable times.
>
> Okay... Have you studied why tcp_v4_mtu_reduced() (and IPv6 equivalent)
> code does not yet handle the retransmit in TCP_SYN_SENT state ?
The problem lies in tcp_simple_retransmit(). Specifically the loop at
the start of the function goes to check the retransmit queue to see if
there are any packets larger than MSS and finds none since we don't
place the SYN w/ data in there and instead have a separate SYN and
data packet.
I'm debating if I should take an alternative approach and modify the
loop at the start of tcp_simple_transmit to add a check for a SYN
packet, tp->syn_data being set, and then comparing the next frame
length + MAX_TCP_HEADER_OPTIONS versus mss.