lockdep: Not able to find what holds lock.

From: Ben Greear
Date: Wed Oct 09 2024 - 14:15:24 EST


Hello,

I'm debugging a lockup related to wifi, and have some questions on lockdep.

Part of the output looks like this. From what I can tell, this first kworker
holds rtnl, and is trying to acquire the wiphy.mtx and is blocked there.

Showing all locks held in the system:
4 locks held by kworker/0:1/10:
#0: ffff888110061548 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0xf16/0x1750
#1: ffff888112bbfd80 (reg_work){+.+.}-{0:0}, at: process_one_work+0x7d6/0x1750
#2: ffffffff856a2ee8 (rtnl_mutex){+.+.}-{4:4}, at: reg_todo+0x13/0x770 [cfg80211]
#3: ffff888144150768 (&rdev->wiphy.mtx){+.+.}-{4:4}, at: reg_process_self_managed_hints+0x6b/0x180 [cfg80211]
3 locks held by kworker/u32:0/11:
#0: ffff888110071948 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0xf16/0x1750
#1: ffff888112bcfd80 ((linkwatch_work).work){+.+.}-{0:0}, at: process_one_work+0x7d6/0x1750
#2: ffffffff856a2ee8 (rtnl_mutex){+.+.}-{4:4}, at: linkwatch_event+0x5/0x50
4 locks held by kworker/u32:1/66:
1 lock held by khungtaskd/67:
...

Nothing else in the printout shows any wiphy.mtx held, but there are some processes
skipped, evidently because they are running but not the current task. Is there any real
harm in adding a patch like this to maybe show what actually holds the lock in question?

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 3468d8230e5f..fc7c1c7e0d8f 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -781,6 +781,7 @@ static void print_lock(struct held_lock *hlock)
static void lockdep_print_held_locks(struct task_struct *p)
{
int i, depth = READ_ONCE(p->lockdep_depth);
+ const char* rnc = "";

if (!depth)
printk("no locks held by %s/%d.\n", p->comm, task_pid_nr(p));
@@ -792,9 +793,10 @@ static void lockdep_print_held_locks(struct task_struct *p)
* and it's not the current task.
*/
if (p != current && task_is_running(p))
- return;
+ rnc = " Not reliable: Running-Not-Current: ";
+
for (i = 0; i < depth; i++) {
- printk(" #%d: ", i);
+ printk(" %s#%d: ", rnc, i);
print_lock(p->held_locks + i);
}
}


And, when a process is printing out its 'held' locks, it seems it is also printing out the one
it wants to be holding but is blocked acquiring. Is there a way to have lockdep print
the process information (preferably including backtrace) for the process that actually
owns the lock currently?

Thanks,
Ben

--
Ben Greear <greearb@xxxxxxxxxxxxxxx>
Candela Technologies Inc http://www.candelatech.com