The problem seems to occur when a large amount of data is sent over a
TCP session (such as a large file transfer), and causes the tcp session
to lock up as follows:
> # netstat -tn
> Active Internet connections (w/o servers)
> Proto Recv-Q Send-Q Local Address Foreign Address State
> tcp 0 -3871 128.2.74.230:1022 192.100.81.115:513 ESTABLISHED
...
> # cat /proc/net/tcp
> sl local_address rem_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode
[snip]
> 254: E64A0280:03FE 735164C0:0201 01 FFFFF0E1:00000000 01:00000032 00000001 0 0 36909
Occasionally, I also get in my syslog:
> Feb 16 12:42:14 myriad kernel: TCP: **bug**: copy=0, sk->mss=0
Needless to say, I want to get this fixed.
It has occured to me that I could strace some of the processes doing
network I/O, to try to see if I can find a specific sequence of writes
which will cause this, but it usually takes several hours before the
bug appears, and that's a lot more debugging output than I have disk
space for. Any other ideas?