Problems with 2.1.29 in a heavy loaded server: CLOSING sockets

Gestor de Sistemes (pau@readysoft.es)
Wed, 12 Mar 1997 09:09:00 +0100 (MET)


Kernel 2.1.29 is working fine in a high traffic server.

Well.... more or less but for a little problems related to sockets that
stay in a CLOSING state forever.

OTOH, it seems that problems of hundreds of sockets laying around for days
has disappeared now.

Well, the bug report:

There's been a socket in this state for 12 hours:
tcp 1 1 194.179.34.2:25 194.224.168.1:1726 CLOSING

where 194.224.168.1 is a Linux box too.

Suddenly I receive a message like this in the console:
tcp_do_sendmsg1: EPIPE dude...

I look at /var/log/messages:
Mar 12 08:26:41 www kernel: >tcp_do_sendmsg1: EPIPE dude...
Mar 12 08:26:41 www kernel: tcp_do_sendmsg1: EPIPE dude...
Mar 12 08:26:41 www last message repeated 422 times
Mar 12 08:26:41 www kernel: tcp_do_sendms>tcp_do_sendmsg1: EPIPE dude...
Mar 12 08:26:41 www kernel: tcp_do_sendmsg1: EPIPE dude...
Mar 12 08:26:41 www last message repeated 248 times

As you see it happens hundreds of times in one second.

Then sendmail dies and, obviously, when I try to restart it I see this
message in maillog:

Mar 12 08:37:16 www sendmail[9717]: NOQUEUE: SYSERR(root):
opendaemonsocket: cannot bind: Address already in use
Mar 12 08:37:16 www sendmail[9717]: problem creating SMTP socket

Because there's the socket hunging:
tcp 1 1 194.179.34.2:25 194.224.168.1:1726 CLOSING

So, I think there's something wrong in the network layer.
This is all I can report about this bug.

I don't know if it can be related, butI also get this sometimes:
Mar 12 07:48:39 www kernel: Socket destroy delayed (r=0 w=620)

And another one:
Mar 12 06:15:49 www kernel: Redirect from C2B32201/eth0 to C2B32207
ignored.Path = C2B32202 -> C2B32207, tos 00

If you need any tests just ask it.

Pau Aliagas
System Manager
Ready Soft