Re: [RFC PATCH RESEND] tcp: avoid F-RTO if SACK and timestamps are disabled

From: Ilpo Järvinen
Date: Fri Jun 15 2018 - 06:35:58 EST

On Fri, 15 Jun 2018, Michal Kubecek wrote:

> On Fri, Jun 15, 2018 at 11:05:03AM +0300, Ilpo Järvinen wrote:
> > On Thu, 14 Jun 2018, Michal Kubecek wrote:
> >
> > My point was that the new data segment bursts that occur if the sender
> > isn't application limited indicate that there's something going wrong
> > with FRTO. And that wrong is also what is causing that RTO loop because
> > the sender doesn't see the previous FRTO recovery on second RTO. With
> > my FRTO undo fix, (new_recovery || icsk->icsk_retransmits) will be false
> > and that will prevent the RTO loop.
> Yes, it would prevent the loop in this case (except it would be a bit
> later, after second RTO rather than after first).

Hmm, I'm actually wrong about the new data missing bit I think. After
reading more code I'm quite sure conventional RTO recovery is triggered
right away (as long as that bogus undo that ends the recovery wouldn't
occur first like it does without my fix). So no second RTO would be

> But I'm not convinced
> the logic of the patch is correct. If I understand it correctly, it
> essentially changes "presumption of innocence" (if we get an ack past
> what we retransmitted, we assume it was received earlier - i.e. would
> have been sacked before if SACK was in use) to "presumption of guilt"
> (whenever a retransmitted segment is acked, we assume nothing else acked
> with it was received earlier). Or that you trade false negatives for
> false positives.

FRTO depends on knowing for sure what packet (original pre-RTO one or
something that was transmitted post-RTO) triggered the ACK. If FRTO
isn't sure that the ACK was generated by a post-RTO packet, it must
not assume innocence! This change in practice affects just the time while
the segment rexmitted by RTO is still there, that is, processing in step 2
(if we get a cumulative ACK beyond it because the next loss is not for the
subsequent segment but for some later segment, FLAG_ORIG_SACK_ACKED is set
and we'll incorrectly do step 3b while still in FRTO has only reached step
2 for real; this is fixed by my patch). ...The decision about
positive/negative only occurs _after_ that in step 3.

> Maybe I understand it wrong but it seems that you de facto prevent
> Step (3b) from ever happening in non-SACK case.

Only if any of skb that was ACKed had been retransmitted. There shouldn't
be any in step 3 because the RTO rexmit was ACKed (and also because
of how new_recovery variable works wrt. earlier recoveries). Thus, in
step 3 the fix won't clear FLAG_ORIG_SACK_ACKED flag allowing FRTO to
detect spurious RTOs using step 3b.

> > > > No! The window should not update window on ACKs the receiver intends to
> > > > designate as "duplicate ACKs". That is not without some potential cost
> > > > though as it requires delaying window updates up to the next cumulative
> > > > ACK. In the non-SACK series one of the changes is fixing this for
> > > > non-SACK Linux TCP flows.
> > >
> > > That sounds like a reasonable change (at least at the first glance,
> > > I didn't think about it too deeply) but even if we fix Linux stack to
> > > behave like this, we cannot force everyone else to do the same.
> >
> > Unfortunately I don't know what the other stacks besides Linux do. But
> > for Linux, the cause for the changing receiver window is the receiver
> > window auto-tuning and I'm not sure if other stacks have a similar
> > feature (or if that affects (almost) all ACKs like in Linux).
> The capture from my previous e-mail and some others I have seen indicate
> that at least some implementations do not take care to never change
> window size when responding to an out-of-order segment. That means that
> even if we change linux TCP this way (we might still need to send
> a separate window update in some cases), we still cannot rely on others
> doing the same.

Those implementations ignore what is a duplicate ACK (RFC5681, which
is also pointed into by RFC5682 for its defination):
DUPLICATE ACKNOWLEDGMENT: An acknowledgment is considered a
"duplicate" ... (e)
the advertised window in the incoming acknowledgment equals the
advertised window in the last incoming acknowledgment.

Not sending duplicate ACKs also means that fast recovery will not work
for those flows but that may not show up performance wise as long as you
have enough capacity for any unnecessary rexmits the forced RTO recovery
is going to do. RTO recovery may actually improve completion times for
non-SACK flows as NewReno recovers only one lost pkt/RTT where as RTO
recovery needs log(outstanding packets) RTTs at worst. For a large number
of losses in a window, the log is going to win.

> I checked the capture attached to my previous e-mail again and there is
> one thing where our F-RTO implementation (in 4.4, at least) is wrong,
> IMHO. While the first ACK after "new data" (sent in (2b)) was a window
> update (and therefore not dupack by definition) so that we could take
> neither (3a) nor (3b), in some iterations there were further acks which
> did not change window size. The text just before Step 1 says
> The F-RTO algorithm does not specify actions for receiving
> a segment that neither acknowledges new data nor is a duplicate
> acknowledgment. The TCP sender SHOULD ignore such segments and
> wait for a segment that either acknowledges new data or is
> a duplicate acknowledgment.
> My understanding is that this means that while the first ack after new
> data is correctly ignored, the following ack which preserves window size
> should be recognized as a dupack and (3a) should be taken.

Linux FRTO never gets that far (without my fix) if the ACK in step 2
covers beyond the RTO rexmit because 3b is prematurely invoked, that's
why you never see what would occur if 3a is taken. TCP thinks it's not
recovering anymore and therefore can send only new data (if there's some

This is what I tried to tell earlier, with new data there you see there's
something else wrong too with FRTO besides the RTO loop.