Re: [PATCH net 2/3] net/mlx5: SD, Keep multi-pf debugfs entries on primary
From: Shay Drori
Date: Thu Apr 02 2026 - 15:51:12 EST
On 02/04/2026 6:09, Jakub Kicinski wrote:
External email: Use caution opening links or attachments
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
net/mlx5: SD, Keep multi-pf debugfs entries on primary
This commit fixes an issue where the "multi-pf" debugfs directory was stored
in the calling device's sd struct but created under the primary device's root.
It explicitly stores the debugfs dentry in the primary device's sd struct.
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c[ ... ]
index 060649645012..4c80b9d25283 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c
@@ -491,6 +499,7 @@ void mlx5_sd_cleanup(struct mlx5_core_dev *dev)
{
struct mlx5_sd *sd = mlx5_get_sd(dev);
struct mlx5_core_dev *primary, *pos;
+ struct mlx5_sd *primary_sd;
int i;
if (!sd)
@@ -500,10 +509,12 @@ void mlx5_sd_cleanup(struct mlx5_core_dev *dev)
goto out;
primary = mlx5_sd_get_primary(dev);
+ primary_sd = mlx5_get_sd(primary);
mlx5_sd_for_each_secondary(i, primary, pos)
sd_cmd_unset_secondary(pos);
sd_cmd_unset_primary(primary);
- debugfs_remove_recursive(sd->dfs);
+ debugfs_remove_recursive(primary_sd->dfs);
+ primary_sd->dfs = NULL;
Can concurrent unbind operations cause a use-after-free and double-free
regression here?
In mlx5_sd_cleanup(), the check !mlx5_devcom_comp_is_ready(sd->devcom) is
performed locklessly. If multiple physical functions in the Socket-Direct
group are unbound concurrently, multiple threads can observe ready == true
and enter this teardown block simultaneously.
Since the cleanup code now operates on the shared primary_sd pointer instead
of the local device's sd, the thread for the primary physical function might
complete its unbind and execute sd_cleanup(dev), freeing its sd object
(which is primary_sd).
Concurrently, the thread for a secondary physical function will dereference
the freed primary_sd to access primary_sd->dfs, causing a use-after-free.
Both threads could also pass the same pointer to debugfs_remove_recursive(),
resulting in a double-free.
Note: This was fixed later in the patch series by commit 92528cec4e0b91deb
("net/mlx5: SD: Serialize init/cleanup").
This is correct, will re-arrange the patches so that this patch will be
after "net/mlx5: SD: Serialize init/cleanup".
--
pw-bot: cr