Re: [TCP bug, regression] stuck distcc connections in latest -git

From: Ilpo Järvinen
Date: Thu Jul 24 2008 - 06:03:48 EST


On Thu, 24 Jul 2008, Ilpo Järvinen wrote:

> On Thu, 24 Jul 2008, Herbert Xu wrote:
>
> > Ingo Molnar <mingo@xxxxxxx> wrote:
> > >
> > > the client (running 2.6.24) does periodic 120 seconds retransmits:
> > >
> > > 07:40:48.255452 IP dione.39201 > phoenix.distcc: . 1608:2144(536) ack 1 win 584007:40:48.255547 IP phoenix.distcc > dione.39201: . ack 2144 win 65535
> > > 07:40:48.255564 IP dione.39201 > phoenix.distcc: . 67143:67679(536) ack 1 win 5840
> > > 07:40:48.255648 IP phoenix.distcc > dione.39201: . ack 2144 win 65535
> > > 07:42:48.255440 IP dione.39201 > phoenix.distcc: . 2144:2680(536) ack 1 win 5840
> > > 07:42:48.255559 IP phoenix.distcc > dione.39201: . ack 2680 win 65535
> > > 07:42:48.255570 IP dione.39201 > phoenix.distcc: . 67679:68215(536) ack 1 win 5840
> > > 07:42:48.255659 IP phoenix.distcc > dione.39201: . ack 2680 win 65535
> > > 07:44:48.255436 IP dione.39201 > phoenix.distcc: . 2680:3216(536) ack 1 win 584007:44:48.255570 IP phoenix.distcc > dione.39201: . ack 3216 win 65535
> > > 07:44:48.255585 IP dione.39201 > phoenix.distcc: . 68215:68751(536) ack 1 win 5840
> > > 07:44:48.255669 IP phoenix.distcc > dione.39201: . ack 3216 win 65535
> >
> > OK, something's seriously screwed up on dione's kernel. Could
> > you please disable syncookies (which should enable SACK for you)
> > and see if the problem goes away?
>
> This looks like the FRTO bugs we fixed in 2.6.25.7, afaik, 2.6.24.y wasn't
> anymore updated at that time so it's a bit obsolete...

But there might be something very interesting on the opposite end change
that is pointed out by this behavior, since one needs considerable amount
of losses in the outstanding window to triggers long delays (the bug was
that FRTO never fallback to go-back-n retransmissions, so one RTO was
necessary per loss), like you found out there's slow progress made and the
situation can resolve (by a big cumulative ACK once all losses are
cleared). Would the receiver for some reason not accept the new data
segment that FRTO sends after getting the ACK of the retransmissions,
the RTO loop would continue forever with FRTO bug (unless userland tears
the connection down).


--
i.