[PATCH net-next 1/3] net/mlx5: HWS, Check if device is down while polling for completion

From: Tariq Toukan

Date: Thu May 07 2026 - 13:37:55 EST


From: Yevgeny Kliteynik <kliteyn@xxxxxxxxxx>

In case the device is down for any reason (e.g. FLR),
the HW will no longer generate completions - no point
polling and waiting for timeout.

Signed-off-by: Yevgeny Kliteynik <kliteyn@xxxxxxxxxx>
Reviewed-by: Erez Shitrit <erezsh@xxxxxxxxxx>
Reviewed-by: Shay Drori <shayd@xxxxxxxxxx>
Signed-off-by: Tariq Toukan <tariqt@xxxxxxxxxx>
---
.../ethernet/mellanox/mlx5/core/steering/hws/bwc.c | 12 ++++++++++++
1 file changed, 12 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c
index 6dcd9c2a78aa..eae02bc74221 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c
@@ -422,6 +422,18 @@ int mlx5hws_bwc_queue_poll(struct mlx5hws_context *ctx,
if (!got_comp && !drain)
return 0;

+ if (unlikely(ctx->mdev->state == MLX5_DEVICE_STATE_INTERNAL_ERROR)) {
+ /* If the device is down for any reason (e.g. FLR), the HW will
+ * no longer generate completions.
+ * Note that ETIMEDOUT is returned here because the BWC layer
+ * already has a special handling for timeouts - it breaks the
+ * rehash / resize / shrink loops to avoid chain of timeouts.
+ */
+ mlx5_core_warn_once(ctx->mdev,
+ "BWC poll: device is down, polling for completion aborted\n");
+ return -ETIMEDOUT;
+ }
+
queue_full = mlx5hws_send_engine_full(&ctx->send_queue[queue_id]);
while (queue_full || ((got_comp || drain) && *pending_rules)) {
ret = mlx5hws_send_queue_poll(ctx, queue_id, comp, burst_th);
--
2.44.0