TCP Stall

Richard B. Johnson (root@analogic.com)
Sun, 30 Mar 1997 21:35:26 -0500 (EST)


March 30, 1997
Via PPP from Groveland, Massachusetts, USA at about 10 baud
on a 56 kb link.

Gentlemen,

I have been looking into the TCP Stall problem when FTPing
files between Linux machines via a PPP link. This problem
also occurs with remotely mounted file-systems. I have
reported this problem over 10 times during the past two
years and I have recorded at least 25 other instances in
which other users have reported the same problem.

I dump all communications on specific problems that I have
encountered into separate Pine "folders", so it's easy to
maintain a history of a specific problem.

Apparently this problem is not considered important because
absolutely nothing has been done about it for over two
years. There have been no experimental patches from Network
gurus attempting to fix this very real and very troublesome
problem.

Instead, I see on the "list" much more important ideas about
graphical boot-up and other esoterics.

My setup consists of a Linux router, quark.analogic.com,
which is visible on the Internet via a Cisco interface. This
router uses Ethernet for its primary communications
interface. This router establishes a PPP Link to another
router, skunkworks.analogic.com. This router handles
Ethernet traffic from my LAN at home to the router at work.
When everything is working correctly, I can access all my
work computers and any Internet services, from any of my
machines at home. All of the machines at home, and the
router are work are Linux Machines. Other machines at work
are Sun Pizza Boxes, SGI machines, a new Alpha, and several
old VAXen. I have connectivity to all these machines when
the PPP Link is running.

The problem is that data being transferred between links
that use megabit speeds and links that use kilo-bit speeds
needs flow control.

The RFCs address flow-control using a variable length
window. RFC-793 addresses the basic window method for flow
control. Since it was written, there has been extensive work
on TCP algorithms to optimize data communications. RFC-1122
addresses the "Silly Window Syndrome".

The Nagle Algorithm implemented in Linux, works to
discourage sending tiny segments when the data to be sent
increases in small increments, while the SWS avoidance
discourages small segments all the time. It is possible, if
the implementation is not robust, for the receiver to send
two or more ACKs per segments received.

Jacobson addresses this problem with the "slow start"
portion of his algorithm. This algorithm is also provided by
default in Linux. Failure of either of these algorithms to
be correctly implemented could cause the problems being
observed.

Normally, I see the window set at 24,820 (right-hand edge).
I don't know why. Perhaps someone determined that it was
optimum. I observe that when the receive buffer gets full on
the machine that is routing packets to my PPP link, the
window abruptly goes to zero (0). This is okay, it means "I
don't have any more room". It could have slowly closed, but
it doesn't. When the window is zero, the machine attempting
to send data to the router, stops sending data. This is
correct. It is not allowed to send data when there is no
room for it. It CAN send packets, however it MUST NOT send
packets containing data.

Now, how does the machine that received a window of zero
know that buffers are available again? I watch the Sun send
a SYN. It receives an ACK with the new window. I don't know
if this is the correct thing to do according to the RFCs,
but it works. It is likely that the routing machine, i.e.,
the one that has buffers loaded with data, trying to free
them by getting the data squeezed into the PPP link, should
be the machine to send a SYN when buffers are available
again.

RFC-1122 defines a standard way to "probe" for the new
window after the window has shrunk to zero. This is shown in
4.2.2.17.

This does not appear to happen with the Linux machines
although it is has been confirmed that "tcpdump" will
randomly drop packets, and often the important ones for
which you are watching.

The machine will stall for as much as 30 minutes until the
sender re-sends an unsolicited data packet (Yes, a packet
with data even though the window was closed). The packet is
ACKed with the new window and normal data-flow restarts
until the router's buffer is full again. This continues
until the file has finally been sent.

The result is that a 1/2 megabyte file will take up to 2
hours to be sent on a 56 kb link. Sun's "snoop" seems to be
a lot better at looking for problems than "tcpdump". Tcpdump
seems to lose a lot of packets. It also fails to interpret
some of them. When looking for network problems, beware of
tcpdump. It is not a very good tool. Perhaps if its captured
binary data were first written to a file, it would not lose
so much information.

Will someone please look into this?

Cheers,
Dick Johnson
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Richard B. Johnson
Project Engineer
Analogic Corporation
Voice : (508) 977-3000 ext. 3754
Fax : (508) 532-6097
Modem : (508) 977-6870
Ftp : ftp@boneserver.analogic.com
Email : rjohnson@analogic.com, johnson@analogic.com
Penguin : Linux version 2.1.30 on an i586 machine (66.15 BogoMips).
Warning : I read unsolicited mail for $350.00 per hour. Supply billing address.
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-