TCP bugs in 2.0.29

Matthew Ghio (ghio@temp0030.myriad.ml.org)
Sun, 16 Feb 1997 19:40:21 -0500


There's a bug in the TCP driver for 2.0.29 which is really starting to
annoy me. I do not think this is a hardware problem. The system is
otherwise very stable, and had been running for several weeks before I
rebooted yesterday to upgrade to 2.0.29. After installing the new
kernel, the tcp bug is still present.

The problem seems to occur when a large amount of data is sent over a
TCP session (such as a large file transfer), and causes the tcp session
to lock up as follows:

> # netstat -tn
> Active Internet connections (w/o servers)
> Proto Recv-Q Send-Q Local Address Foreign Address State
> tcp 0 -3871 128.2.74.230:1022 192.100.81.115:513 ESTABLISHED
...
> # cat /proc/net/tcp
> sl local_address rem_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode
[snip]
> 254: E64A0280:03FE 735164C0:0201 01 FFFFF0E1:00000000 01:00000032 00000001 0 0 36909

Occasionally, I also get in my syslog:
> Feb 16 12:42:14 myriad kernel: TCP: **bug**: copy=0, sk->mss=0

Needless to say, I want to get this fixed.

It has occured to me that I could strace some of the processes doing
network I/O, to try to see if I can find a specific sequence of writes
which will cause this, but it usually takes several hours before the
bug appears, and that's a lot more debugging output than I have disk
space for. Any other ideas?