[PATCH 0/2] workqueue lockdep limitations/bugs

From: Johannes Berg
Date: Tue Aug 21 2018 - 08:38:48 EST



Hi Tejun, Lia, all,

While doing some unrelated testing earlier, I found a bug in workqueue
lockdep, and while looking into the code I found another bug, and another
limitation. I'll follow up with patches for the two bugs, and explain all
three issues here.


1)
Lockdep report is below - the problem is the following: cancel_work_sync()
calls flush_work(), which may or may not do something, depending on whether
or not it's running. This is reasonable for semantics of the workqueue, but
it's not fine for lockdep, and also the reason for issue 3 below.

The lockdep report I hit basically says:
* you're flushing (but really I'm doing cancel_work_sync) the work
item, so we take the workqueue lockdep map (it's an ordered
workqueue)
* the workqueue has other work items that can be on it, that take locks
that we already hold (at least indirectly via a dependency chain)
* thus a locking violation occurred - the other work might be pending
or running, so you have to wait for it and get a deadlock

Note how this isn't true - we really call cancel_work_sync(), so we don't
need to wait for any other work to run. If this one's running, it must be
the first one running on the workqueue (otherwise we'd just have cancelled
it and it wouldn't be running). Thus, the code is fine, but because
cancel_work_sync() just uses flush_work(), the annotation isn't quite
right. I've fixed that in the follow-up patch by making the annotation
depend on where we came from, but I'm not entirely sure that's the right
thing to do.

2)
The second issue is that we never really use work->lockdep_map when we
flush the work, and thus if we have code like

work_function
{
mutex_lock(&mutex);
}

other_function
{
mutex_lock(&mutex);
flush_work(&work);
}

we never actually get a lockdep report. I've fixed this by putting a lock
map acquire/release in the (hopefully) correct place.

3)
The third problem is more complicated, but it means that we likely miss a
lot of potential locking problems with ordered workqueues, and I think we
should really try to fix this somehow.

Let's say we again have an ordered workqueue, and the following:

work1_function
{
mutex_lock(&mutex);
}

work2_function
{
// empty
}

other_function
{
queue_work_on(&ordered_wq, &work1);
queue_work_on(&ordered_wq, &work2);
flush_workqueue(&ordered_wq);
mutex_lock(&mutex);
flush_work(&work2);
}

As far as I can tell, this will not generate a lockdep report, because the
flush_work() will not find any pwq in start_flush_work(), and thus will do
"goto already_gone" without ever doing anything with lockdep. After my
second patch, at least it will still acquire and release the
work2.lockdep_map, but that doesn't create any dependency with the whole
workqueue.

I realize that this is difficult since the work struct isn't tied to the
workqueue while it's not running, and it could possibly be used with
different workqueues, but I think we need to solve this too, somehow.

Perhaps we can give each work struct *two* lockdep maps? One that includes
the workqueue dependency, and one that doesn't? Then we could create a
dependency when scheduling:

schedule_work_on():
lock_map_acquire(work->wq_lockdep_map);
lock_map_acquire(wq->lockdep_map);

and then in flush we take both:
lock_map_acquire(work->lockdep_map);
lock_map_acquire(work->wq_lockdep_map);

(perhaps also when running the work).

Then, the above scenario should create the following relevant lockdep chain:

mutex --> work2->wq_lockdep_map
work2->wq_lockdep_map --> wq->lockdep_map
wq->lockdep_map --> work->wq_lockdep_map
work->wq_lockdep_map --> mutex

I think?

johannes


----
lockdep report for problem 1

======================================================
WARNING: possible circular locking dependency detected
4.18.0-rc8+ #1573 Not tainted
------------------------------------------------------
wpa_supplicant/708 is trying to acquire lock:
00000000a1e36a54 ((wq_completion)"%s"wiphy_name(local->hw.wiphy)){+.+.}, at: flush_work+0x175/0x280

but task is already holding lock:
00000000056b83e2 (&local->sta_mtx){+.+.}, at: __sta_info_flush+0x94/0x190

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #3 (&local->sta_mtx){+.+.}:
__mutex_lock+0x8e/0x9c0
ieee80211_del_key+0x40/0x1e0
nl80211_del_key+0x126/0x450
genl_family_rcv_msg+0x2aa/0x370
genl_rcv_msg+0x47/0x90
netlink_rcv_skb+0x4a/0x110
genl_rcv+0x24/0x40
netlink_unicast+0x186/0x230
netlink_sendmsg+0x3b7/0x3e0
sock_sendmsg+0x16/0x20
___sys_sendmsg+0x2d5/0x2f0
__sys_sendmsg+0x51/0x90
do_syscall_64+0x59/0x390
entry_SYSCALL_64_after_hwframe+0x49/0xbe

-> #2 (&wdev->mtx){+.+.}:
__mutex_lock+0x8e/0x9c0
ieee80211_sta_rx_queued_mgmt+0x4b/0x3e0
ieee80211_iface_work+0x21f/0x340
process_one_work+0x28f/0x700
worker_thread+0x39/0x3f0
kthread+0x116/0x130
ret_from_fork+0x3a/0x50

-> #1 ((work_completion)(&sdata->work)){+.+.}:
process_one_work+0x262/0x700
worker_thread+0x39/0x3f0
kthread+0x116/0x130
ret_from_fork+0x3a/0x50

-> #0 ((wq_completion)"%s"wiphy_name(local->hw.wiphy)){+.+.}:
lock_acquire+0xa3/0x230
flush_work+0x195/0x280
__cancel_work_timer+0x137/0x1e0
ieee80211_sta_tear_down_BA_sessions+0x8d/0x150
__sta_info_destroy_part1+0x5d/0x9c0
__sta_info_flush+0xd8/0x190
ieee80211_set_disassoc+0xd2/0x7d0
ieee80211_mgd_auth+0x200/0x340
cfg80211_mlme_auth+0x152/0x440
nl80211_authenticate+0x2fd/0x360
genl_family_rcv_msg+0x2aa/0x370
genl_rcv_msg+0x47/0x90
netlink_rcv_skb+0x4a/0x110
genl_rcv+0x24/0x40
netlink_unicast+0x186/0x230
netlink_sendmsg+0x3b7/0x3e0
sock_sendmsg+0x16/0x20
___sys_sendmsg+0x2d5/0x2f0
__sys_sendmsg+0x51/0x90
do_syscall_64+0x59/0x390
entry_SYSCALL_64_after_hwframe+0x49/0xbe

other info that might help us debug this:

Chain exists of:
(wq_completion)"%s"wiphy_name(local->hw.wiphy) --> &wdev->mtx --> &local->sta_mtx

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(&local->sta_mtx);
lock(&wdev->mtx);
lock(&local->sta_mtx);
lock((wq_completion)"%s"wiphy_name(local->hw.wiphy));

*** DEADLOCK ***

5 locks held by wpa_supplicant/708:
#0: 00000000a50eb6ff (cb_lock){++++}, at: genl_rcv+0x15/0x40
#1: 000000008954a22a (genl_mutex){+.+.}, at: genl_rcv_msg+0x7a/0x90
#2: 000000007622d582 (rtnl_mutex){+.+.}, at: nl80211_pre_doit+0xe1/0x190
#3: 000000004cd75553 (&wdev->mtx){+.+.}, at: nl80211_authenticate+0x2b1/0x360
#4: 00000000056b83e2 (&local->sta_mtx){+.+.}, at: __sta_info_flush+0x94/0x190

stack backtrace:
CPU: 3 PID: 708 Comm: wpa_supplicant Not tainted 4.18.0-rc8+ #1573
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014
Call Trace:
dump_stack+0x85/0xcb
print_circular_bug.isra.19+0x1d1/0x2c0
check_prev_add.constprop.27+0x542/0x7c0
__lock_acquire+0x1107/0x11e0
lock_acquire+0xa3/0x230
flush_work+0x195/0x280
__cancel_work_timer+0x137/0x1e0
ieee80211_sta_tear_down_BA_sessions+0x8d/0x150
__sta_info_destroy_part1+0x5d/0x9c0
__sta_info_flush+0xd8/0x190
ieee80211_set_disassoc+0xd2/0x7d0
ieee80211_mgd_auth+0x200/0x340
cfg80211_mlme_auth+0x152/0x440
nl80211_authenticate+0x2fd/0x360
genl_family_rcv_msg+0x2aa/0x370
genl_rcv_msg+0x47/0x90
netlink_rcv_skb+0x4a/0x110
genl_rcv+0x24/0x40
netlink_unicast+0x186/0x230
netlink_sendmsg+0x3b7/0x3e0
sock_sendmsg+0x16/0x20
___sys_sendmsg+0x2d5/0x2f0
__sys_sendmsg+0x51/0x90
do_syscall_64+0x59/0x390
entry_SYSCALL_64_after_hwframe+0x49/0xbe