Re: [lock validator] inet6_destroy_sock(): soft-safe -> soft-unsafe lock dependency

From: Ingo Molnar
Date: Thu Feb 02 2006 - 08:53:58 EST



* Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx> wrote:

> On Thu, Feb 02, 2006 at 11:54:29AM +0100, Ingo Molnar wrote:
> >
> > hm, i got a new one:
> >
> > ============================================
> > [ BUG: circular locking deadlock detected! ]
> > --------------------------------------------
> > sshd/28997 is trying to acquire lock:
> > (&sk->sk_lock.slock){-+}, at: [<c0c6be28>] packet_rcv+0xbf/0x34b
> >
> > but task is already holding lock:
> > (&dev->xmit_lock){-+}, at: [<c0bb04ec>] qdisc_restart+0x46/0x207
> >
> > which lock already depends on the new lock,
> > which could lead to circular deadlocks!
> >
> > the dependency chain (in reverse order) is:
> > -> #2 (&dev->xmit_lock){-+}: [<c0bb04ec>] qdisc_restart+0x46/0x207
> > -> #1 (&dev->queue_lock){-+}: [<c0b98137>] dev_queue_xmit+0xc3/0x21e
> > -> #0 (&sk->sk_lock.slock){-+}: [<c0c6be28>] packet_rcv+0xbf/0x34b
>
> I believe this is a false positive and I think I can see where it went
> wrong. The dependency between #0 and #1 is the broken premise.
>
> The validator is probably putting all sk_lock's in the same basket.
> That is, it's mixing up the socket locks for TCP, UDP as well as
> AF_PACKET. While it is true that TCP and UDP's sk_lock may sit
> outside queue_lock, AF_PACKET never transmits while holding its
> sk_lock.

you are right, the validator indeed treated all these protocols as one,
so this is a false positive. (there are two slock buckets already,
because there is some natural lock nesting between listen sockets and
ordinary sockets, but otherwise you are right that all net protocols
were indeed treated as one category.)

> So the #0 => #1 dependency shouldn't exist. Can you get the validator
> to print out the reasoning for the #0 => #1 dependency? That should
> clarify the problem.

the slock -> queue_lock dependency was first observed at:

-> (&sk->sk_lock.slock/1){-+}
-> &dev->queue_lock [<c0b98137>] dev_queue_xmit+0xc3/0x21e

so it comes from dev_queue_xmit(). No further information about where
that was done - but i suspect it was TCP's sendmsg. af_packet.c never
seems to be doing that.

The queue_lock -> xmit_lock dependency was first observed at:

-> (&dev->queue_lock){-+}
-> &dev->xmit_lock [<c0bb0827>] dev_activate+0xc8/0xe5

i think i'll solve this by splitting af_packet.c's slock into a separate
bucket (separate lock type). (Such exceptions and locking assymetries
can be expressed towards the validator in a pretty straightforward way,
by initializing those locks in a special way.)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/