Re: [PATCH 1/4] Bluetooth: hci_sync: pin conn across hci_le_create_conn_sync

From: Luiz Augusto von Dentz

Date: Mon May 11 2026 - 11:05:31 EST


Hi Michael,

On Mon, May 11, 2026 at 10:34 AM Michael Bommarito
<michael.bommarito@xxxxxxxxx> wrote:
>
> hci_le_create_conn_sync() runs from the cmd_sync workqueue with a
> struct hci_conn pointer it interprets out of the work item's void
> *data argument. The hci_conn_valid() check at function entry is a
> TOCTOU: nothing prevents hci_disconn_complete_evt() (executing on
> hdev->workqueue rx_work) from running between the
> hci_conn_hash_lookup walk in hci_conn_valid() and the body's first
> deref. hci_disconn_complete_evt() -> hci_conn_del() -> hci_conn_cleanup()
> unregisters the device and drops the final kref, which kfrees the
> hci_conn slot. The cmd_sync callback then writes through the freed
> pointer (clear_bit on conn->flags, conn->state, the four
> le_conn_*_interval fields).
>
> A KASAN slab-use-after-free splat in cache kmalloc-8k confirms the
> bug on linux-next tip commit bee6ea30c487 ("Add linux-next specific
> files for 20260421") under UML+KASAN, matching the slab geometry of
> the syzbot trace fixed in commit 035c25007c9e ("Bluetooth: hci_sync:
> Fix UAF in le_read_features_complete").
>
> Follow the reference-pinning pattern from commit 035c25007c9e
> ("Bluetooth: hci_sync: Fix UAF in le_read_features_complete") and
> commit 0beddb0c380b ("Bluetooth: hci_conn: fix potential UAF in
> create_big_sync"): the queue site takes a reference via
> hci_conn_get() so the slot is not freed between
> hci_disconn_complete_evt() retiring the conn and the cmd_sync
> callback / completion handler returning. The completion handler
> drops the reference on every exit path, including the -ECANCELED
> short-circuit.
>
> Introduce a static helper hci_cmd_sync_queue_conn_once() so the
> get/put pair is not open-coded at every queue site. See the
> helper's kerneldoc for the -EEXIST contract.
>
> The hci_conn_valid() check in the callback body is retained: a
> logically-deleted-but-still-referenced conn has stale
> hdev->conn_hash.list state, and continuing to drive a connection
> attempt on it would be a logic bug even though the memory is safe.
>
> Pauli Virtanen posted a series-wide variant of this fix as
> https://lore.kernel.org/linux-bluetooth/e18591f264c50e15917cb8b9e5f9798d9880979d.1762100290.git.pav@xxxxxx/
> (PATCH v2 8/8, 2025-11-02). KASAN reproducer captured under
> UML+KASAN (linux-next tip bee6ea30c487).
>
> Fixes: 881559af5f5c ("Bluetooth: hci_sync: Attempt to dequeue connection attempt")
> Cc: stable@xxxxxxxxxxxxxxx
> Assisted-by: Claude:claude-opus-4-7
> Signed-off-by: Michael Bommarito <michael.bommarito@xxxxxxxxx>
> ---
> net/bluetooth/hci_sync.c | 41 ++++++++++++++++++++++++++++++++--------
> 1 file changed, 34 insertions(+), 7 deletions(-)
>
> diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c
> index fd3aacdea512..b20e07474257 100644
> --- a/net/bluetooth/hci_sync.c
> +++ b/net/bluetooth/hci_sync.c
> @@ -786,6 +786,31 @@ int hci_cmd_sync_queue_once(struct hci_dev *hdev, hci_cmd_sync_work_func_t func,
> }
> EXPORT_SYMBOL(hci_cmd_sync_queue_once);
>
> +/* Queue an HCI command entry once, pinning a hci_conn for the duration.
> + *
> + * On success, the cmd_sync queue owns one hci_conn_get() reference;
> + * the supplied destroy callback must hci_conn_put() to balance.
> + *
> + * On any failure return (including -EEXIST, where
> + * hci_cmd_sync_queue_once() neither invokes destroy nor consumes the
> + * data pointer because an existing entry already owns the slot), the
> + * helper releases the reference before returning, so callers do not
> + * need to discriminate failure codes to keep the refcount balanced.
> + */
> +static int hci_cmd_sync_queue_conn_once(struct hci_dev *hdev,

Id suggest we dropped the once at the end so just hci_cmd_sync_queue_conn.

> + hci_cmd_sync_work_func_t func,
> + struct hci_conn *conn,
> + hci_cmd_sync_work_destroy_t destroy)
> +{
> + int err;
> +
> + err = hci_cmd_sync_queue_once(hdev, func, hci_conn_get(conn), destroy);
> + if (err)
> + hci_conn_put(conn);
> +
> + return err;

Then we incorporate return (err == -EEXIST) ? 0 : err; logic above, so
I don't think any caller should require queuing multiple procedures
for the same conn.

> +}
> +
> /* Run HCI command:
> *
> * - hdev must be running
> @@ -6982,36 +7007,38 @@ static void create_le_conn_complete(struct hci_dev *hdev, void *data, int err)
> bt_dev_dbg(hdev, "err %d", err);
>
> if (err == -ECANCELED)
> - return;
> + goto done;
>
> hci_dev_lock(hdev);
>
> if (!hci_conn_valid(hdev, conn))
> - goto done;
> + goto unlock;
>
> if (!err) {
> hci_connect_le_scan_cleanup(conn, 0x00);
> - goto done;
> + goto unlock;
> }
>
> /* Check if connection is still pending */
> if (conn != hci_lookup_le_connect(hdev))
> - goto done;
> + goto unlock;
>
> /* Flush to make sure we send create conn cancel command if needed */
> flush_delayed_work(&conn->le_conn_timeout);
> hci_conn_failed(conn, bt_status(err));
>
> -done:
> +unlock:
> hci_dev_unlock(hdev);
> +done:
> + hci_conn_put(conn);
> }
>
> int hci_connect_le_sync(struct hci_dev *hdev, struct hci_conn *conn)
> {
> int err;
>
> - err = hci_cmd_sync_queue_once(hdev, hci_le_create_conn_sync, conn,
> - create_le_conn_complete);
> + err = hci_cmd_sync_queue_conn_once(hdev, hci_le_create_conn_sync, conn,
> + create_le_conn_complete);
> return (err == -EEXIST) ? 0 : err;
> }
>
> --
> 2.53.0
>


--
Luiz Augusto von Dentz