Re: [PATCH net 2/3] net/mlx5: SD, Keep multi-pf debugfs entries on primary
From: Jakub Kicinski
Date: Wed Apr 01 2026 - 23:16:05 EST
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
net/mlx5: SD, Keep multi-pf debugfs entries on primary
This commit fixes an issue where the "multi-pf" debugfs directory was stored
in the calling device's sd struct but created under the primary device's root.
It explicitly stores the debugfs dentry in the primary device's sd struct.
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c
> index 060649645012..4c80b9d25283 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c
[ ... ]
> @@ -491,6 +499,7 @@ void mlx5_sd_cleanup(struct mlx5_core_dev *dev)
> {
> struct mlx5_sd *sd = mlx5_get_sd(dev);
> struct mlx5_core_dev *primary, *pos;
> + struct mlx5_sd *primary_sd;
> int i;
>
> if (!sd)
> @@ -500,10 +509,12 @@ void mlx5_sd_cleanup(struct mlx5_core_dev *dev)
> goto out;
>
> primary = mlx5_sd_get_primary(dev);
> + primary_sd = mlx5_get_sd(primary);
> mlx5_sd_for_each_secondary(i, primary, pos)
> sd_cmd_unset_secondary(pos);
> sd_cmd_unset_primary(primary);
> - debugfs_remove_recursive(sd->dfs);
> + debugfs_remove_recursive(primary_sd->dfs);
> + primary_sd->dfs = NULL;
Can concurrent unbind operations cause a use-after-free and double-free
regression here?
In mlx5_sd_cleanup(), the check !mlx5_devcom_comp_is_ready(sd->devcom) is
performed locklessly. If multiple physical functions in the Socket-Direct
group are unbound concurrently, multiple threads can observe ready == true
and enter this teardown block simultaneously.
Since the cleanup code now operates on the shared primary_sd pointer instead
of the local device's sd, the thread for the primary physical function might
complete its unbind and execute sd_cleanup(dev), freeing its sd object
(which is primary_sd).
Concurrently, the thread for a secondary physical function will dereference
the freed primary_sd to access primary_sd->dfs, causing a use-after-free.
Both threads could also pass the same pointer to debugfs_remove_recursive(),
resulting in a double-free.
Note: This was fixed later in the patch series by commit 92528cec4e0b91deb
("net/mlx5: SD: Serialize init/cleanup").
--
pw-bot: cr