Re: [PATCH] thunderbolt: prevent XDomain delayed work use-after-free on disconnect

From: Mika Westerberg

Date: Tue May 26 2026 - 09:58:35 EST


Hi,

On Mon, May 25, 2026 at 08:57:36AM -0400, Michael Bommarito wrote:
> tb_xdp_handle_request() runs on system_wq and queues
> xd->state_work via queue_delayed_work() in three request handlers:
> PROPERTIES_CHANGED_REQUEST, UUID_REQUEST (via start_handshake),
> and LINK_STATE_CHANGE_REQUEST. Similarly, update_xdomain() queues
> xd->properties_changed_work from bus_for_each_dev() when local
> properties change.
>
> Concurrently, tb_xdomain_remove() calls stop_handshake() which does
> cancel_delayed_work_sync() on both delayed works, then
> device_unregister() which eventually frees the xdomain. Since
> commit 559c1e1e0134 ("thunderbolt: Run tb_xdp_handle_request() in
> system workqueue") moved the request handler off tb->wq, the
> handler and the remove path are no longer serialized. If
> queue_delayed_work() executes after cancel_delayed_work_sync() but
> before the xdomain is freed, the delayed work fires on a freed
> object.
>
> Add xd->removing that tb_xdomain_remove() sets under xd->lock
> before calling stop_handshake(). Each external queue site holds
> the same lock and checks removing before calling
> queue_delayed_work(). This provides the mutual exclusion needed:
> either the queue site acquires the lock first and queues work that
> the subsequent cancel will see, or the remove path acquires the
> lock first and the queue site observes removing == true and skips
> the queue.

There are bunch of changes that touch xdomain.c in my thunderbolt.git/next
branch and some of them change how tb_xdomain_remove() work. I wonder if
you could check against that branch if we still have this issue?