Re: TCP Stall]

Alan Cox (alan@lxorguk.ukuu.org.uk)
Mon, 31 Mar 1997 20:08:13 +0100 (BST)


> Apparently this problem is not considered important because
> absolutely nothing has been done about it for over two
> years. There have been no experimental patches from Network
> gurus attempting to fix this very real and very troublesome
> problem.

Now let me see over the past two years the TCP code has actually been totally
rewritten, so thats utter crap.

> Normally, I see the window set at 24,820 (right-hand edge).
> I don't know why. Perhaps someone determined that it was

Because the normal case is that your application is reading data faster
than the network is streaming it. Contrary to your message the TCP window
is an application/application flow control.

> the machine that is routing packets to my PPP link, the
> window abruptly goes to zero (0). This is okay, it means "I
> don't have any more room". It could have slowly closed, but

It means the other end decided it had no room and sent a packet with a
zero window. A sudden looking change like that is indicative of a good
connection and generally means the window size is now smaller than the
effective round trip time. Losing a block of acks suddenely can cause
the same effect.

> correct. It is not allowed to send data when there is no
> room for it. It CAN send packets, however it MUST NOT send
> packets containing data.

On the contrary it is supposed to send 1 byte of data into the closed
window as a probe.

> Now, how does the machine that received a window of zero
> know that buffers are available again? I watch the Sun send
> a SYN. It receives an ACK with the new window. I don't know

Your trace is bogus. A SYN is the start of a new connection.

> The machine will stall for as much as 30 minutes until the
> sender re-sends an unsolicited data packet (Yes, a packet
> with data even though the window was closed). The packet is

If it stalls that long something has gone wrong. There is a specific case
fixed in pre 2.0.30 ISS snapshots where the cwnd can hit 0 (cwnd is the
window used for network flow control)

> The result is that a 1/2 megabyte file will take up to 2
> hours to be sent on a 56 kb link. Sun's "snoop" seems to be

Im seeing 8K/second over 64K links quite reliably, so there is more to it
than this

> a lot better at looking for problems than "tcpdump". Tcpdump
> seems to lose a lot of packets. It also fails to interpret
> some of them. When looking for network problems, beware of
> tcpdump. It is not a very good tool. Perhaps if its captured

tcpdump is a very good tool. You've either got a buggy tcpdump, or some
other problem if its dropping more than a tiny number of packets. And you
have non IP stuff on your net if its decoding it. tcpdump can also capture
binary format to a file and then analyse it. It can also filter the input
to avoid excessive load.

Try 2.0.29 with the ISS patches.

Alan