Re: 3.11-rc6 genetlink locking fix offends lockdep

From: Johannes Berg
Date: Mon Aug 19 2013 - 04:00:30 EST



> 3.11-rc6's commit 58ad436fcf49 ("genetlink: fix family dump race")
> gives me the lockdep trace below at startup.

Hmm. Yes, I see now how this happens, not sure why I didn't run into it.

The problem is that genl_family_rcv_msg() is called with the genl_lock
held, and then calls netlink_dump_start() with it held, creating a
genl_lock->cb_mutex dependency, but obviously the dump continuation is
the other way around.

We could use the semaphore instead, I believe, but I don't really
understand the mutex vs. semaphore well enough to be sure that's
correct.

johannes


diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
index f85f8a2..6cfa646 100644
--- a/net/netlink/genetlink.c
+++ b/net/netlink/genetlink.c
@@ -792,7 +792,7 @@ static int ctrl_dumpfamily(struct sk_buff *skb, struct netlink_callback *cb)
bool need_locking = chains_to_skip || fams_to_skip;

if (need_locking)
- genl_lock();
+ down_read(&cb_lock);

for (i = chains_to_skip; i < GENL_FAM_TAB_SIZE; i++) {
n = 0;
@@ -815,7 +815,7 @@ errout:
cb->args[1] = n;

if (need_locking)
- genl_unlock();
+ up_read(&cb_lock);

return skb->len;
}


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/