[PATCH] tipc: checking returns and Re: Possible Circular Locking in TIPC

From: Jarek Poplawski
Date: Thu Dec 28 2006 - 07:16:14 EST


On 22-12-2006 15:28, Eric Sesterhenn wrote:
> hi,
>
> while running my usual stuff on 2.6.20-rc1-git5, sfuzz (http://www.digitaldwarf.be/products/sfuzz.c)
> did the following, to produce the lockdep warning below:
...
> Here is the stacktrace:
>
> [ 313.239556] =======================================================
> [ 313.239718] [ INFO: possible circular locking dependency detected ]
> [ 313.239795] 2.6.20-rc1-git5 #26
> [ 313.239858] -------------------------------------------------------
> [ 313.239929] sfuzz/4133 is trying to acquire lock:
> [ 313.239996] (ref_table_lock){-+..}, at: [<c04cd0a9>] tipc_ref_discard+0x29/0xe0
> [ 313.241101]
> [ 313.241105] but task is already holding lock:
> [ 313.241225] (&table[i].lock){-+..}, at: [<c04cb7c0>] tipc_deleteport+0x40/0x1a0
> [ 313.241524]
> [ 313.241528] which lock already depends on the new lock.
> [ 313.241535]
> [ 313.241709]
> [ 313.241713] the existing dependency chain (in reverse order) is:
> [ 313.241837]
> [ 313.241841] -> #1 (&table[i].lock){-+..}:
> [ 313.242096] [<c01366c5>] __lock_acquire+0xd05/0xde0
> [ 313.242562] [<c0136809>] lock_acquire+0x69/0xa0
> [ 313.243013] [<c04d4040>] _spin_lock_bh+0x40/0x60
> [ 313.243476] [<c04cd1cb>] tipc_ref_acquire+0x6b/0xe0
> [ 313.244115] [<c04cac73>] tipc_createport_raw+0x33/0x260
> [ 313.244562] [<c04cbe21>] tipc_createport+0x41/0x120
> [ 313.245007] [<c04c57ec>] tipc_subscr_start+0xcc/0x120
> [ 313.245458] [<c04bdb56>] process_signal_queue+0x56/0xa0
> [ 313.245906] [<c011ea18>] tasklet_action+0x38/0x80
> [ 313.246361] [<c011ecbb>] __do_softirq+0x5b/0xc0
> [ 313.246817] [<c01060e8>] do_softirq+0x88/0xe0
> [ 313.247450] [<ffffffff>] 0xffffffff
> [ 313.247894]
> [ 313.247898] -> #0 (ref_table_lock){-+..}:
> [ 313.248155] [<c0136415>] __lock_acquire+0xa55/0xde0
> [ 313.248601] [<c0136809>] lock_acquire+0x69/0xa0
> [ 313.249037] [<c04d4100>] _write_lock_bh+0x40/0x60
> [ 313.249486] [<c04cd0a9>] tipc_ref_discard+0x29/0xe0
> [ 313.249922] [<c04cb7da>] tipc_deleteport+0x5a/0x1a0
> [ 313.250543] [<c04cd4f8>] tipc_create+0x58/0x160
> [ 313.250980] [<c0431cb2>] __sock_create+0x112/0x280
> [ 313.251422] [<c0431e5a>] sock_create+0x1a/0x20
> [ 313.251863] [<c04320fb>] sys_socket+0x1b/0x40
> [ 313.252301] [<c0432a72>] sys_socketcall+0x92/0x260
> [ 313.252738] [<c0102fd0>] syscall_call+0x7/0xb
> [ 313.253175] [<ffffffff>] 0xffffffff
> [ 313.253778]
> [ 313.253782] other info that might help us debug this:
> [ 313.253790]
> [ 313.253956] 1 lock held by sfuzz/4133:
> [ 313.254019] #0: (&table[i].lock){-+..}, at: [<c04cb7c0>] tipc_deleteport+0x40/0x1a0
> [ 313.254346]
> [ 313.254351] stack backtrace:
> [ 313.254470] [<c01045da>] show_trace_log_lvl+0x1a/0x40
> [ 313.254594] [<c0104d72>] show_trace+0x12/0x20
> [ 313.254711] [<c0104e79>] dump_stack+0x19/0x20
> [ 313.254829] [<c013480f>] print_circular_bug_tail+0x6f/0x80
> [ 313.254952] [<c0136415>] __lock_acquire+0xa55/0xde0
> [ 313.255070] [<c0136809>] lock_acquire+0x69/0xa0
> [ 313.255188] [<c04d4100>] _write_lock_bh+0x40/0x60
> [ 313.255315] [<c04cd0a9>] tipc_ref_discard+0x29/0xe0
> [ 313.255435] [<c04cb7da>] tipc_deleteport+0x5a/0x1a0
> [ 313.255565] [<c04cd4f8>] tipc_create+0x58/0x160
> [ 313.255687] [<c0431cb2>] __sock_create+0x112/0x280
> [ 313.255811] [<c0431e5a>] sock_create+0x1a/0x20
> [ 313.255942] [<c04320fb>] sys_socket+0x1b/0x40
> [ 313.256059] [<c0432a72>] sys_socketcall+0x92/0x260
> [ 313.256179] [<c0102fd0>] syscall_call+0x7/0xb
> [ 313.256300] =======================
>
> Greetings, Eric

Hello,

Maybe I misinterpret this but, IMHO lockdep
complains about locks acquired in different
order: tipc_ref_acquire() gets ref_table_lock
and then tipc_ret_table.entries[index]->lock,
but tipc_deleteport() inversely (with:
tipc_port_lock() and tipc_ref_discard()).
I hope maintainers will decide the correct
order.

Btw. there is a problem with tipc_ref_discard():
it should be called with tipc_port_lock, but
how to discard a ref if this lock can't be
acquired? Is it OK to call it without the lock
like in subscr_named_msg_event()?

Btw. #2: during this checking I've found
two places where return values from
tipc_ref_lock() and tipc_port_lock() are not
checked, so I attach a patch proposal for
this (compiled but not tested):

Regards,
Jarek P.
---

[PATCH] tipc: checking returns from locking functions

Checking of return values from tipc_ref_lock()
and tipc_port_lock() added in 2 places.

Signed-off-by: Jarek Poplawski <jarkao2@xxxxx>
---

diff -Nurp linux-2.6.20-rc2-/net/tipc/port.c linux-2.6.20-rc2/net/tipc/port.c
--- linux-2.6.20-rc2-/net/tipc/port.c 2006-11-29 22:57:37.000000000 +0100
+++ linux-2.6.20-rc2/net/tipc/port.c 2006-12-28 11:05:17.000000000 +0100
@@ -238,7 +238,12 @@ u32 tipc_createport_raw(void *usr_handle
return 0;
}

- tipc_port_lock(ref);
+ if (!tipc_port_lock(ref)) {
+ tipc_ref_discard(ref);
+ warn("Port creation failed, reference table invalid\n");
+ kfree(p_ptr);
+ return 0;
+ }
p_ptr->publ.ref = ref;
msg = &p_ptr->publ.phdr;
msg_init(msg, DATA_LOW, TIPC_NAMED_MSG, TIPC_OK, LONG_H_SIZE, 0);
diff -Nurp linux-2.6.20-rc2-/net/tipc/subscr.c linux-2.6.20-rc2/net/tipc/subscr.c
--- linux-2.6.20-rc2-/net/tipc/subscr.c 2006-12-18 09:01:04.000000000 +0100
+++ linux-2.6.20-rc2/net/tipc/subscr.c 2006-12-28 11:31:27.000000000 +0100
@@ -499,7 +499,12 @@ static void subscr_named_msg_event(void

/* Add subscriber to topology server's subscriber list */

- tipc_ref_lock(subscriber->ref);
+ if (!tipc_ref_lock(subscriber->ref)) {
+ warn("Subscriber rejected, unable to find port\n");
+ tipc_ref_discard(subscriber->ref);
+ kfree(subscriber);
+ return;
+ }
spin_lock_bh(&topsrv.lock);
list_add(&subscriber->subscriber_list, &topsrv.subscriber_list);
spin_unlock_bh(&topsrv.lock);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/