[PATCH v4] IB/mlx5: Fix state corruption and resource leaks in loopback enablement
From: Prathamesh Deshpande
Date: Wed Apr 01 2026 - 19:53:11 EST
In mlx5_ib_alloc_transport_domain(), an early success path was
returning 'err' (which is 0) instead of a literal 0.
Additionally, as identified by Sashiko, if mlx5_ib_enable_lb() fails
to update the hardware, it leaves the software state in an
inconsistent state where reference counters are incremented but the
hardware remains disabled. Fixing this in the caller created a race
window where the mutex was released between enablement and rollback.
Update mlx5_ib_enable_lb() to perform an atomic rollback of reference
counters and only set the 'enabled' flag if the hardware command
succeeds.
Also, add error handling in mlx5_ib_alloc_transport_domain() to
deallocate the transport domain (tdn) if loopback enablement fails.
Signed-off-by: Prathamesh Deshpande <prathameshdeshpande7@xxxxxxxxx>
---
v4:
- Moved rollback logic into mlx5_ib_enable_lb() to ensure atomicity
within the mutex and prevent race conditions [Sashiko].
v3:
- Also call mlx5_ib_disable_lb() on failure to roll back software state/counters
[Sashiko].
v2:
- Added deallocation of tdn if mlx5_ib_enable_lb() fails [Sashiko].
- Reworded commit message to reflect the functional fix and credit the tool.
drivers/infiniband/hw/mlx5/main.c | 17 ++++++++++++++---
1 file changed, 14 insertions(+), 3 deletions(-)
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 635002e684a5..877b02e98033 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -2022,7 +2022,14 @@ int mlx5_ib_enable_lb(struct mlx5_ib_dev *dev, bool td, bool qp)
dev->lb.qps == 1) {
if (!dev->lb.enabled) {
err = mlx5_nic_vport_update_local_lb(dev->mdev, true);
- dev->lb.enabled = true;
+ if (err) {
+ if (td)
+ dev->lb.user_td--;
+ if (qp)
+ dev->lb.qps--;
+ } else {
+ dev->lb.enabled = true;
+ }
}
}
@@ -2068,9 +2075,13 @@ static int mlx5_ib_alloc_transport_domain(struct mlx5_ib_dev *dev, u32 *tdn,
if ((MLX5_CAP_GEN(dev->mdev, port_type) != MLX5_CAP_PORT_TYPE_ETH) ||
(!MLX5_CAP_GEN(dev->mdev, disable_local_lb_uc) &&
!MLX5_CAP_GEN(dev->mdev, disable_local_lb_mc)))
- return err;
+ return 0;
- return mlx5_ib_enable_lb(dev, true, false);
+ err = mlx5_ib_enable_lb(dev, true, false);
+ if (err)
+ mlx5_cmd_dealloc_transport_domain(dev->mdev, *tdn, uid);
+
+ return err;
}
static void mlx5_ib_dealloc_transport_domain(struct mlx5_ib_dev *dev, u32 tdn,
--
2.43.0