[RESEND PATCH v5 0/6] Bluetooth: fix locking and socket killing in SCO and RFCOMM
From: Desmond Cheong Zhi Xi
Date: Wed Aug 04 2021 - 11:48:33 EST
Apologies, resending because I botched the previous cover-letter and
linux-bluetooth was left out of the cc. Just noticed when the
bluez.test.bot didn't respond. Please reply to this series rather than
the one sent previously.
Hi,
This patch series started out as a fix for "inconsistent lock state in
sco_sock_timeout" reported by Syzbot [1].
Patch 1 is sufficient to fix this error. This was also confirmed by the
reproducer for "BUG: corrupted list in kobject_add_internal (3)" [2]
which consistently hits the inconsistent lock state error.
However, while testing the proposed fix, the reproducer for [1] would
randomly return a human-unreadable error [3]. After further
investigation, this bug seems to be caused by an unrelated error with
forking [4].
While trying to fix the mysterious error, additional fixes were added,
such as switching to lock_sock and serializing _{set,clear}_timer.
Additionally, as the reproducer kept hitting the oom-killer, a fix for
SCO socket killing was also added.
The reproducer for [1] was robust enough to catch errors with these
additional fixes, hence all the patches in this series were squashed then
tested with the reproducer for [1].
Overall, this series makes the following changes:
- Patch 1: Schedule SCO sock timeouts with delayed_work to avoid
inconsistent lock usage (removes SOFTIRQs from SCO)
- Patch 2: Avoid a circular dependency between hci_dev_lock and
lock_sock (enables the switch to lock_sock)
- Patch 3: Switch to lock_sock in SCO now that SOFTIRQs and potential
deadlocks are removed
- Patch 4: Serialize calls to sco_sock_{set,clear}_timer
- Patch 5: Switch to lock_sock in RFCOMM
- Patch 6: fix SCO socket killing
v4 -> v5:
- Renamed the delayed_work variable, moved checks for sco_pi(sk)->conn
into sco_sock_{clear,set}_timer, as suggested by Luiz Augusto von Dentz
and Marcel Holtmann.
- Added check for conn->sk in sco_sock_timeout, accompanied by a
sock_hold to avoid UAF errors.
- Added check to flush work items before freeing conn.
- Avoid a circular dependency between hci_dev_lock and lock_sock.
- Switch to lock_sock in SCO, as suggested by Marcel Holtmann.
- Serial calls to sco_sock_{set,clear}_timer.
- Switch to lock_sock in RFCOMM, as suggested by Marcel Holtmann.
- Add a fix for SCO socket killing.
v3 -> v4:
- Switch to using delayed_work to schedule SCO sock timeouts instead
of using local_bh_disable. As suggested by Luiz Augusto von Dentz.
v2 -> v3:
- Split SCO and RFCOMM code changes, as suggested by Luiz Augusto von
Dentz.
- Simplify local bh disabling in SCO by using local_bh_disable/enable
inside sco_chan_del since local_bh_disable/enable pairs are reentrant.
v1 -> v2:
- Instead of pulling out the clean-up code out from sco_chan_del and
using it directly in sco_conn_del, disable local softirqs for relevant
sections.
- Disable local softirqs more thoroughly for instances of
bh_lock_sock/bh_lock_sock_nested in the bluetooth subsystem.
Specifically, the calls in af_bluetooth.c and rfcomm/sock.c are now made
with local softirqs disabled as well.
Link: https://syzkaller.appspot.com/bug?id=9089d89de0502e120f234ca0fc8a703f7368b31e [1]
Link: https://syzkaller.appspot.com/bug?extid=66264bf2fd0476be7e6c [2]
Link: https://syzkaller.appspot.com/text?tag=CrashReport&x=172d819a300000 [3]
Link: https://syzkaller.appspot.com/bug?id=e1bf7ba90d8dafcf318666192aba1cfd65507377 [4]
Best wishes,
Desmond
Desmond Cheong Zhi Xi (6):
Bluetooth: schedule SCO timeouts with delayed_work
Bluetooth: avoid circular locks in sco_sock_connect
Bluetooth: switch to lock_sock in SCO
Bluetooth: serialize calls to sco_sock_{set,clear}_timer
Bluetooth: switch to lock_sock in RFCOMM
Bluetooth: fix repeated calls to sco_sock_kill
net/bluetooth/rfcomm/sock.c | 8 +--
net/bluetooth/sco.c | 107 +++++++++++++++++++++---------------
2 files changed, 66 insertions(+), 49 deletions(-)
--
2.25.1