Our web server was up for three months on a 1.99.4 kernel before a
memory leak thrashed it to death while I took a weekend off a couple
of weeks ago. I thought that some of the TCP problems had been fixed
since then so I didn't think it was worth giving the gory details.
In case this problem is still relevant, I have the output of a
"netstat -not" done some days before it died. The connection states
break down as follows:
3 CLOSE
2 CLOSE_WAIT
532 CLOSING
9 ESTABLISHED
48 FIN_WAIT1
86 FIN_WAIT2
3 LAST_ACK
6 SYN_SENT
34 TIME_WAIT
Apart from that copy of the netstat output, I can't dig around any
more since it's been rebooted, of course. I would guess the kernel
memory leak (which ate up the 64Mb memory and 64Mb swap) was at least
partially due to network buffers (the send queue figures in that
netstat add up to 7Mb). However, it's still running the same kernel
(I might schedule an upgrade to a recent 2.0.x in a couple of weeks)
so, assuming the problems recur, I can poke around a bit on the running
system. The same stats as above for current netstat output show:
# netstat -not | tail +3|awk '{print $6}' | sort | uniq -c
1 CLOSE
2 CLOSE_WAIT
44 CLOSING
9 ESTABLISHED
10 FIN_WAIT1
23 FIN_WAIT2
2 LAST_ACK
5 SYN_SENT
22 TIME_WAIT
so if it *is* the same problem, maybe it's CLOSING we're talking about
rather than CLOSE or CLOSE_WAIT.
--Malcolm
-- Malcolm Beattie <mbeattie@sable.ox.ac.uk> Oxford University Computing Services "Widget. It's got a widget. A lovely widget. A widget it has got." --Jack Dee