Re: Oops in 2.2.15pre7

From: Oleg Drokin (green@ccssu.ccssu.crimea.ua)
Date: Sun Feb 20 2000 - 05:41:44 EST


Hello!

Alan Cox wrote:

> > Nope, I think that tulip has nothing to do with that.
> > Try patch at the end of this letter, it definitely should help you.
> It should make absolutely no odds.
Indeed. I looked to wrong fcntl.c, that from 2.3.46 :-\

> > ALAN: What do you think about backporting of locking on Fasync_list
> > from 2.3 to 2.2, is that needed?
> That actually looks like what is needed
Ok. I'll dig it.

> > + if(sock->fasync_list != NULL)
> > + kill_fasync(sock->fasync_list, SIGIO);
> The kill_fasync code starts
> while(fa)
True.

> so it checks this. For that oops trace to be right it would imply someone
> zapped the list. The list is only changed by fasync_helper. That locks the
Ah, I found it in tty_io.c, but comment stating that
/*
 * fasync_helper() is used by some character device drivers (mainly mice)
 * to set up the fasync queue. It returns negative on error, 0 if it did
 * no changes and positive if it added/deleted the entry.
 */
Moreover, we have net/socket.c::sock_fasync, which does almost the same
(and with almost the same cde, only locking differs) see below for that.

> list when doing the two critical updates for adding/removing an entry and
> seems to have a small race. I find it hard to believe this is what is going
> on but unless I am missing something
> save_flags(flags);
> cli();
> *fp = fa->fa_next;
> restore_flags(flags);
> kfree(fa);
> return 1;
> is safe even SMP since the cli/restore_flags means that we cannot be part
> way down a queue walk when the entry is freed.
Hmm... This is verystrange to my view, here is assembler dump from kill_fasync:
     mov 0x8(%esp,1),%ebx /* That's fa, ut's stored on stack
        so that is anybody changes real value, this one will not change */
     test %ebx,%ebx /* test fa,for not null */
     je 0xc012c6fd <kill_fasync+65> /* if null -> go out of loop */
     lea 0x0(%esi),%esi /* not sure for this */
     cmpl $0x4601,(%ebx) /* dereference ebx (which is not null!.
                                    we tested it!) */
And we got an OOPS at last operation.
Looking at OOPS, I see that %ebx = 0x15, kinda odd.
 
What I also see, in socket.c::sock_fasync:
        lock_sock(sock->sk);
...
                sock->fasync_list=fna;
...
                        *prev=fa->fa_next;
                        kfree_s(fa,sizeof(struct fasync_struct));
...
        release_sock(sock->sk);
But, e.g. sock_wake_async does not pay attention to sock->sk (locked, or not)
at all. particular oops trace does not look like this case, but still I think
we can get in situation where we kfree_s our struct fasync_struct just
under feet of another process performing operations on it.

What do you say about this?

--- net/socket.c.orig Sat Feb 19 23:15:54 2000
+++ net/socket.c Sun Feb 20 12:40:07 2000
@@ -561,7 +561,9 @@
                 /* fall through */
         case 0:
         call_kill:
+ lock_sock(sock->sk);
                 kill_fasync(sock->fasync_list, SIGIO);
+ release_sock(sock->sk);
                 break;
         }
         return 0;
Bye,
    Oleg

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Wed Feb 23 2000 - 21:00:25 EST