[PATCH net V3 0/4] net/mlx5: Fixes for Socket-Direct

From: Tariq Toukan

Date: Thu Apr 23 2026 - 08:32:16 EST


Hi,

This series fixes several race conditions and bugs in the mlx5
Socket-Direct (SD) single netdev flow.

Patch 1 serializes mlx5_sd_init()/mlx5_sd_cleanup() with
mlx5_devcom_comp_lock() and tracks the SD group state on the primary
device, preventing concurrent or duplicate bring-up/tear-down.

Patch 2 fixes the debugfs "multi-pf" directory being stored on the
calling device's sd struct instead of the primary's, which caused
memory leaks and recreation errors when cleanup ran from a different PF.

Patch 3 fixes a race where a secondary PF could access the primary's
auxiliary device after it had been unbound, by holding the primary's
device lock while operating on its auxiliary device.

Patch 4 fixes missing cleanup on ETH probe/resume errors.

Regards,
Tariq

V3:
- Link to V2:
https://lore.kernel.org/all/20260413105323.186411-1-tariqt@xxxxxxxxxx/
- Added "net/mlx5e: SD, Fix missing cleanup on probe/resume error"
patch to solve missing cleanup bug. (Sashiko)
- remove MLX5_SD_STATE_DESTROYING and move
mlx5_devcom_comp_set_ready(false) to mlx5_sd_cleanup(), simplify the
locking around SD state. (Sashiko)

Shay Drory (4):
net/mlx5: SD: Serialize init/cleanup
net/mlx5: SD, Keep multi-pf debugfs entries on primary
net/mlx5e: SD, Fix missing cleanup on probe/resume error
net/mlx5e: SD, Fix race condition in secondary device probe/remove

.../net/ethernet/mellanox/mlx5/core/en_main.c | 32 +++++++--
.../net/ethernet/mellanox/mlx5/core/lib/sd.c | 68 +++++++++++++++----
.../net/ethernet/mellanox/mlx5/core/lib/sd.h | 2 +
3 files changed, 86 insertions(+), 16 deletions(-)


base-commit: d40831b016b4986e70d20d0ad14e6a0c62318986
--
2.44.0