Blocking for too long in SO_LINGER...

From: Steve Kann (stevek@SteveK.COM)
Date: Thu Jan 27 2000 - 12:44:29 EST


        I saw a little bit of a conversation in the linux-kernel
mailing list between Alan Cox and Steven Clarke, and it seems to
describe the problem I'm having. It seems that the kernel is blocking
on close for sockets set to SO_LINGER for a _very_ long time..

        http://www.uwsg.indiana.edu/hypermail/linux/kernel/9909.2/0350.html

        Even where the LINGER was set (elsewhere in the code) it was set
        to 120 (units - seconds? - claims to be hundreths of seconds in
        the setsockopt man page but I think that's wrong). In the
        problem that I demonstrated, close took 11 minutes to return!

I've been using pnserver (crappy software) for a while on linux 2.0.x,
and it's for the most part worked without any problem. I have it now
set up on a new 2.2.12-20 (redhat kernel) machine, and it sometimes just
locks up for many minutes.

I did a strace on the process, and found that it was blocking on the
close() of a socket with SO_LINGER set. I'm not sure if strace is
reporting this properly, but I think that the timeout being set is 8
seconds (which seems like an awfully bad choice by real if so).

Anyways, I've fouind it actually blocking for _much_ longer than 8
seconds -- in fact, in once case, I

Here it is blocking for about 130 seconds:

setsockopt(23, SOL_SOCKET, SO_LINGER, [1], 8) = 0
time([949008208]) = 949008208
write(1, "170.153.121.76 - - [27/Jan/2000:"..., 155) = 155
close(23) = 0
send(14, "?", 1, 0) = 1
oldselect(39, [4 5 6 8 9 10 13 14 16 17 21], [], NULL, {0, 46775}) = 9 (in [4 8 9 10 13 14 16 17 21], left {0, 50000})
gettimeofday({949008338, 669581}, NULL) = 0

And here, I figure I better get this restarted after about 10 minutes,
before the people listening to the streams pack up and leave:

16:53:28.079382 setsockopt(13, SOL_SOCKET, SO_LINGER, [1], 8) = 0
16:53:28.079638 time([949010008]) = 949010008
16:53:28.079861 write(1, "170.153.121.102 - - [27/Jan/2000:21:53:28 +0000] \"GET c1/use"..., 163) = 163
16:53:28.080184 close(13) = 0
17:03:07.471132 --- SIGHUP (Hangup) ---
17:03:07.472156 gettimeofday({949010587, 472268}, NULL) = 0

Any idea what is happening here? Any kind of quick-fix I can do here:
(I have a high-proofile event tomorrow, and naturally, I don't have the
source to the silly application).. Maybe I can quickly get the kernel
to ignore SO_LINGER either entirely (and I guess I'll find out what
breaks), or for just particular processes.

-SteveK

-- 
    Steve Kann  - Horizon Live Distance Learning - 841 Broadway, Suite 502
   P:stevek@SteveK.COM - B:stevek@HorizonLive.com - R:KC2FBU (212) 533-1775
  "The box said 'Requires Windows 95, NT, or better,' so I installed Linux."

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Mon Jan 31 2000 - 21:00:18 EST