Re: [PATCH v2] thunderbolt: prevent XDomain delayed work use-after-free on disconnect

From: Mika Westerberg

Date: Thu May 28 2026 - 06:12:35 EST


Hi,

On Wed, May 27, 2026 at 07:46:04AM -0400, Michael Bommarito wrote:
> tb_xdp_handle_request() runs on system_wq and queues
> xd->state_work via queue_delayed_work() in three request handlers:
> PROPERTIES_CHANGED_REQUEST, UUID_REQUEST (via start_handshake),
> and LINK_STATE_CHANGE_REQUEST. Similarly, update_xdomain() queues
> xd->properties_changed_work when local properties change.
>
> Concurrently, tb_xdomain_remove() calls stop_handshake() which does
> cancel_delayed_work_sync() on both delayed works. Later,
> tb_xdomain_unregister() calls device_unregister() which eventually
> frees the xdomain. Since commit 559c1e1e0134 ("thunderbolt: Run
> tb_xdp_handle_request() in system workqueue") moved the request
> handler off tb->wq, the handler and the remove path are no longer
> serialized. If queue_delayed_work() executes after
> cancel_delayed_work_sync() but before the xdomain is freed, the
> delayed work fires on a freed object.
>
> Add xd->removing that tb_xdomain_remove() sets under xd->lock
> before calling stop_handshake(). Each external queue site holds
> the same lock and checks removing before calling
> queue_delayed_work(). This provides the mutual exclusion needed:
> either the queue site acquires the lock first and queues work that
> the subsequent cancel will see, or the remove path acquires the
> lock first and the queue site observes removing == true and skips
> the queue.
>
> Fixes: 559c1e1e0134 ("thunderbolt: Run tb_xdp_handle_request() in system workqueue")
> Cc: stable@xxxxxxxxxxxxxxx
> Assisted-by: Claude:claude-opus-4-7
> Signed-off-by: Michael Bommarito <michael.bommarito@xxxxxxxxx>
> ---
> v2: Rebased onto thunderbolt.git/next per Mika's request. Verified
> the race persists on next: tb_xdp_handle_request still runs on
> system_wq, the remove/unregister split does not add
> synchronization with the queue sites. Updated commit message to
> reflect that tb_xdomain_unregister() now does the
> device_unregister (split from tb_xdomain_remove on next).

Thanks Michael! Applied to thunderbolt.git/next. I would like to keep this
one for a while in linux-next and then send it with the rest for v7.2-rc1
where stable folks can then pick it up.