[PATCH net V2 0/3] net/mlx5: Fixes for Socket-Direct

From: Tariq Toukan

Date: Mon Apr 13 2026 - 06:59:17 EST


Hi,

This series fixes several race conditions and bugs in the mlx5
Socket-Direct (SD) single netdev flow.

Patch 1 serializes mlx5_sd_init()/mlx5_sd_cleanup() with
mlx5_devcom_comp_lock() and tracks the SD group state on the primary
device, preventing concurrent or duplicate bring-up/tear-down.

Patch 2 fixes the debugfs "multi-pf" directory being stored on the
calling device's sd struct instead of the primary's, which caused
memory leaks and recreation errors when cleanup ran from a different PF.

Patch 3 fixes a race where a secondary PF could access the primary's
auxiliary device after it had been unbound, by holding the primary's
device lock while operating on its auxiliary device.

Regards,
Tariq

V2:
- Link to V1:
https://lore.kernel.org/all/20260330193412.53408-1-tariqt@xxxxxxxxxx/
- Reorder the patches so that "net/mlx5: SD: Serialize init/cleanup"
is first.
- Add MLX5_SD_STATE_DESTROYING to the patch above to solve a concurrent
edge case.
- Expend commit message of "net/mlx5e: SD, Fix race condition in
secondary device probe/remove"

Shay Drory (3):
net/mlx5: SD: Serialize init/cleanup
net/mlx5: SD, Keep multi-pf debugfs entries on primary
net/mlx5e: SD, Fix race condition in secondary device probe/remove

.../net/ethernet/mellanox/mlx5/core/en_main.c | 18 +++--
.../net/ethernet/mellanox/mlx5/core/lib/sd.c | 70 ++++++++++++++++---
.../net/ethernet/mellanox/mlx5/core/lib/sd.h | 2 +
3 files changed, 77 insertions(+), 13 deletions(-)


base-commit: 2dddb34dd0d07b01fa770eca89480a4da4f13153
--
2.44.0