Re: Fix for thread+network crashes in 2.0/2.1?

Henner Eisen (eis@baty.hanse.de)
Sat, 28 Feb 1998 22:49:03 +0100


Hi,

torvalds@transmeta.com (Linus Torvalds) writes:

>
> Anyway, I'm releasing a 2.1.89-3 on ftp.kernel.org under the "testing"
> subdirectory, and I'd be very happy if people would test it. I don't
> guarantee that this patch works at all - for all I know it might be
> totally broken, and the only thing I guarantee is that (a) it compiles
> with my particular setup and (b) it looks like it should work and makes
> sense.
>
> Please do test, and comment,
>

Test
====

Well, I tried. But with the x.25-based telnetd, 2.1.89-3 still freezes
when select() calls free_wait(). (The corrupted wait queues residing in
already cleaned inodes).

Comments
========

When I tried to debug the free_wait() oops occuring in the x.25 telentd's
select() call, my final suspection was that the corrupted wait queues
are caused by inconsistent inode use counts. When a socket (struct socket)
is created by means of sys_socket() the following function is called:
(line numbers from 2.1.88)

net/socket.c

194 static int get_fd(struct inode *inode)
195 {
196 int fd;
197
198 /*
199 * Find a file descriptor suitable for return to the user.
200 */
:
217
218 /*
219 * The socket maintains a reference to the inode, so we
220 * have to increment the count.
221 */
222 inode->i_count++;
223

I think increasing the i_count is obsolete. In the current kernel sources,
there seem to be no stand-alone instances of struct socket. All instances
are in the inode:

include/linux/fs.h

320 struct inode {
:
358 union {
:
375 struct socket socket_i;
:
377 } u;
378 };

But instances of struct sock maintain a reference to the inode. Not directly,
but there are (at least) two indirect references:

include/net/sock.c

344 struct sock
345 {
:
402 struct wait_queue **sleep;
:
531 struct socket *socket;
:
550 };

Thus, socket (when set) points to a storage area which belongs to the
inode instance. And sleep (when set) allways points to the member wait of

61 struct socket
62 {
:
71 struct wait_queue *wait;
:
76 };

which also belongs to the inode instance.

However, when new instances of struct sock are created, the inode use count
is newer increased. This probably does not matter as long as only one instance
of struct sock refers to a particular inode (the obsolete use count mentionned
above compensates for this). Notice that there might be (i.e. during an
accept()), two instances of struct sock that refer the same socket/inode.

When the socket is closed, i_count is decremented

net/socket.c

302 void sock_release(struct socket *sock)
:
314 iput(sock->inode);
315 }

However, if there is a second instance of struct sock that refers the
inode or if destruction of struct sock is delayed, there are still references
to the inode (indirectly by means of sock::sleep or sock::socket) are still
references. But i_count doesn't know about that that and the i_node may be
freed leaving wild pointers in a struckt sock instance.

What I've done in af_x25.c was increasing the socket inode's i_count each time
when a new struct sock::sleep is set and doing an iput just before
sk_free() is called. This cures the at least the symptoms (even if I remove
the i_count++ and iput() from net/socket.c).

Henner

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu