Re: [PATCH net] net/mlx5e: Fix use-after-free in mlx5e_tx_reporter_timeout_recover

From: Tariq Toukan

Date: Tue May 12 2026 - 07:17:08 EST




On 12/05/2026 14:08, Cosmin Ratiu wrote:
On Wed, 2026-04-08 at 19:44 +0100, Matt Fleming wrote:
From: Matt Fleming <mfleming@xxxxxxxxxxxxxx>

First of all, apologies for the delay, I missed this and it seems
nobody else reacted for more than a month.

Next time, you will probably get more immediate reactions if you
directly CC the people involved in the patch which introduced the bug.
This will also make the patchwork checkers happier.


mlx5e_tx_reporter_timeout_recover() accesses sq->netdev after
mlx5e_safe_reopen_channels() has torn down and freed the channel (and
its embedded SQs). Replace the three sq->netdev references with
priv->netdev which is safe because priv outlives channel teardown.

The netdev_err() call already used priv->netdev for this reason; make
the trylock/unlock and health_channel_eq_recover calls consistent.

This fixes the following KASAN splat:

  BUG: KASAN: use-after-free in
mlx5e_tx_reporter_timeout_recover+0x1dd/0x360 [mlx5_core]
  Read of size 8 at addr ffff889860ed0b28 by task kworker/u113:2/5277

  Call Trace:
   mlx5e_tx_reporter_timeout_recover+0x1dd/0x360 [mlx5_core]
   devlink_health_reporter_recover+0xa2/0x150
   devlink_health_report+0x254/0x7c0
   mlx5e_reporter_tx_timeout+0x297/0x380 [mlx5_core]
   mlx5e_tx_timeout_work+0x109/0x170 [mlx5_core]
   process_one_work+0x677/0xf20
   worker_thread+0x51f/0xd90
   kthread+0x3a5/0x810
   ret_from_fork+0x208/0x400
   ret_from_fork_asm+0x1a/0x30

Fixes: 83ac0304a2d7 ("net/mlx5e: Fix deadlocks between devlink and
netdev instance locks")
Signed-off-by: Matt Fleming <mfleming@xxxxxxxxxxxxxx>
---
 drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
index afdeb1b3d425..8409ae73768f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
@@ -160,13 +160,13 @@ static int
mlx5e_tx_reporter_timeout_recover(void *ctx)
  * channels are being closed for other reason and this work
is not
  * relevant anymore.
  */
- while (!netdev_trylock(sq->netdev)) {
+ while (!netdev_trylock(priv->netdev)) {
  if (!test_bit(MLX5E_STATE_CHANNELS_ACTIVE, &priv-
state))
  return 0;
  msleep(20);
  }
- err = mlx5e_health_channel_eq_recover(sq->netdev, eq, sq-
cq.ch_stats);
+ err = mlx5e_health_channel_eq_recover(priv->netdev, eq, sq-
cq.ch_stats);
  if (!err) {
  to_ctx->status = 0; /* this sq recovered */
  goto out;
@@ -186,7 +186,7 @@ static int mlx5e_tx_reporter_timeout_recover(void
*ctx)
     "mlx5e_safe_reopen_channels failed recovering
from a tx_timeout, err(%d).\n",
     err);
 out:
- netdev_unlock(sq->netdev);
+ netdev_unlock(priv->netdev);
  return err;
 }

Thank you for the fix, it is a real problem which can happen if direct
SQ recovery fails and all channels need to be reopened, which is
apparently what happened in your KASAN report.

Reviewed-by: Cosmin Ratiu <cratiu@xxxxxxxxxx>

Thanks for your patch.
I think that due to our delayed response you'll have to resend.

You can add our tags:
Reviewed-by: Tariq Toukan <tariqt@xxxxxxxxxx>