Re: TCP accept bug in 2.0 and 2.0.34pre10 patch

David S. Miller (davem@dm.cobaltmicro.com)
Wed, 22 Apr 1998 01:03:33 -0700


Date: Tue, 21 Apr 1998 16:14:31 +0200 (MET DST)
From: Jaroslav Kysela <perex@jcu.cz>

I found strange TCP accept bug in 2.0 kernels (include
2.0.34pre10). List of waiting TCP connections (socket ->
receive_queue) can be at some times corrupted due to interrupts.

This bug can produce some "zombie" sockets which aren't
passed over accept() to user space, but leaves in kernel forever or
until some timeout isn't expired.

Are you sure? Please show me the code path which leads to the
corruption of the receive_queue for a listening socket.

1) All interrupt paths in the networking check for the socket
being locked (via sk->users count, see tcp_rcv() for example)

2) Only other places sk->receive_queue is touched from a TCP input
routine is during release_sock() backlog emptying, this happens
without the socket lock but in start_bh_atomic() sequence so
no other packet input processing can occur and touch the
receive_queue

3) All non-interrupt code which touches receive_queue of a sock,
does so with the socket locked (look in the tcp_accept() code
you changed, at all critical moments, lock_sock(sk) has been
done).

I'm going to be real difficult about this, because I spent numerous
non-stop weeks looking for any and all conditions which could lead to
this, and even after seeing your patch, I still don't know where it
could possibly occur in the 2.0.x kernel. Please teach me how it can
happen ;-)

If you figure it out, it might even lead to a fix for the nasty
"tcp_recvmsg() OOPS" many squid users see, or even this fix you have
here could be what kills that bug ;-)

Later,
David S. Miller
davem@dm.cobaltmicro.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu