Re: Linux 2.1.118 SMP problem

Alan Cox (alan@lxorguk.ukuu.org.uk)
Wed, 26 Aug 1998 23:21:49 +0100 (BST)


> On the serial console we get the message
>
> wait_on_bh, CPU 0:
> irq: 1 [0 1]
> bh: 1 [0 1]
> <[c0113c4f]> <[c0175342]> <[c0175424]> <[c0148761]>
>
> repeating every few seconds.
>
> System.map says:
>
> del_timer __rpc_wake_up rpc_wake_up_task nfs_updatepage

Its waiting for a bh to clear for the synchronize_bh in timer.c
because the other CPU has deadlocked in an IRQ or BH. How many
processors does this machine have.

I can see this deadlock for 2 or more CPUs in the sunrpc code

Its

CPU #0
nfs->sunrpc->udp_sendmsg->lock_sock->copying
irq,bh
net_bh->udp_recv->rpc callback->sk_recv_datagram-lock_sock => Deadlock

CPU #1
spots it

That means someone has to fix the UDP callback cases for the NFS sockets.

Im not sure why it should occur SMP only or if its the one you see, but that
does look like a deadlock to me. The lock_sock code also claims its
'a very broken bottom half synchronization mechanism' which worries me
even more.

Also that bug has been there for a lot longer than the past 2 or 3 kernels

Alan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.altern.org/andrebalsa/doc/lkml-faq.html