[PATCH net-next V2 0/3] net/mlx5: Fix E-Switch work queue deadlock with devlink lock

From: Tariq Toukan

Date: Tue Apr 28 2026 - 01:11:19 EST


Hi,

See detailed description by Mark below [1].

Regards,
Tariq

[1]
mlx5_eswitch_cleanup() calls destroy_workqueue() while holding the
devlink lock through mlx5_uninit_one(). E-Switch workqueue workers also
need the devlink lock, but previously took it before checking whether
their work item was stale. Cleanup can therefore wait for a worker that
is blocked on the same devlink lock.

Mode changes have the same ordering hazard: the mode-change path holds
devlink lock while tearing down the current mode, and old work may still
be pending on the E-Switch workqueue.

Fix this by making esw_wq_handler() check the generation counter before
attempting to take devlink lock. The worker uses devl_trylock(); if the
lock is busy and the work is still current, it sleeps on an E-Switch wait
queue with a short timeout. Invalidation increments the generation
counter and wakes the wait queue, so stale workers exit without spinning
or blocking cleanup.

The generation counter already existed but was buried in
mlx5_esw_functions and only covered function-change events. The three
patches get from there to the fix in small steps.

Patch 1 moves the counter up to mlx5_eswitch. Pure refactor,
no behavior change.

Patch 2 cleans up the work queue plumbing: factors out the repeated
lock/check/dispatch boilerplate into a single esw_wq_handler() and
adds mlx5_esw_add_work() as the one place to enqueue work.

Patch 3 is the actual fix: check the generation before the lock, use
devl_trylock() instead of devl_lock(), add a wait queue so lock retries
do not spin, and invalidate pending work at the earliest safe operation
boundary. Cleanup invalidates before destroy_workqueue(), and mode
teardown unregisters the work-producing notifiers before invalidating so
new notifier work cannot capture the new generation.

V2:

Split out from a larger series. The representor lifecycle improvements
to be sent separately.

Patch 3:
- Move generation invalidation after notifier unregister but before
teardown, so old work is discarded early without allowing new notifier
work to use the new generation.
- Replace cond_resched() polling with a wait queue to avoid CPU spinning
while devlink lock is held by a long operation.

Link to V1:
https://lore.kernel.org/all/20260409115550.156419-1-tariqt@xxxxxxxxxx/

Mark Bloch (3):
net/mlx5: E-Switch, move work queue generation counter
net/mlx5: E-Switch, introduce generic work queue dispatch helper
net/mlx5: E-Switch, fix deadlock between devlink lock and esw->wq

.../net/ethernet/mellanox/mlx5/core/eswitch.c | 20 +++-
.../net/ethernet/mellanox/mlx5/core/eswitch.h | 5 +-
.../mellanox/mlx5/core/eswitch_offloads.c | 96 ++++++++++++-------
3 files changed, 82 insertions(+), 39 deletions(-)


base-commit: 254f49634ee16a731174d2ae34bc50bd5f45e731
--
2.44.0