Re: Linux 2.1.118 SMP problem

Linus Torvalds (torvalds@transmeta.com)
Wed, 26 Aug 1998 14:35:27 -0700 (PDT)


On Wed, 26 Aug 1998, Alan Cox wrote:
>
> I can see this deadlock for 2 or more CPUs in the sunrpc code
>
> Its
>
> CPU #0
> nfs->sunrpc->udp_sendmsg->lock_sock->copying
> irq,bh
> net_bh->udp_recv->rpc callback->sk_recv_datagram-lock_sock => Deadlock

If the socket is locked, then it shouldn't call udp_recv, but put the new
datagram on the backlog instead.

> Im not sure why it should occur SMP only or if its the one you see, but that
> does look like a deadlock to me. The lock_sock code also claims its
> 'a very broken bottom half synchronization mechanism' which worries me
> even more.

The lock_sock() code works, it's just fairly inefficient on SMP (it's
really effcient on UP). The main cause of inefficiency is that the "socket
locked" case really has to wait for all bh's right now, simply because it
doesn't know which bh (and when) is being in the critical region. So it
does a "bh_synchronize()" even though that is potentially _way_ too much
synchronization (it may end up waiting for a timer bh to run, even though
the timer bh doesn't actually touch the socket at all).

So it's really a granularity issue rather than a correctness issue.

IF the NFS queueing really has the deadlock you point out, it should
certainly be a deadlock on UP too. It may be that it only shows up on SMP
for some magic timing reason rather than anything else.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.altern.org/andrebalsa/doc/lkml-faq.html