Re: [PATCH net-next V2 6/7] net/mlx5: E-switch, load reps via work queue after registration
From: Mark Bloch
Date: Sat May 02 2026 - 16:09:08 EST
On 01/05/2026 7:16, Tariq Toukan wrote:
> From: Mark Bloch <mbloch@xxxxxxxxxx>
>
> mlx5_eswitch_register_vport_reps() only installs representor callbacks and
> marks the rep type as registered. If the E-Switch is already in switchdev
> mode, the newly registered rep type must then be loaded for already enabled
> vports.
>
> That load path needs to run under the devlink lock, which is not held by
> the auxiliary driver registration context. Queue the reload to the E-Switch
> workqueue, whose handler acquires the devlink lock, and load the relevant
> representors from there.
>
> The unregister path is unchanged and still unloads representors
> synchronously while tearing down the registered callbacks.
>
> Signed-off-by: Mark Bloch <mbloch@xxxxxxxxxx>
> Signed-off-by: Tariq Toukan <tariqt@xxxxxxxxxx>
> ---
sashiko.dev says:
"
[Note: The format of this report is altered due to recitation restrictions. Code quoting and inline diffs have been omitted.]
commit f4f3e4872e599ef6d13a1f2bacffd52e6dbc8d37
Author: Mark Bloch <mbloch@xxxxxxxxxx>
net/mlx5: E-switch, load reps via work queue after registration
This commit defers the loading of representors for already enabled vports to
a workqueue when the E-Switch is in switchdev mode. This ensures the load
path runs under the devlink lock, which is not held during auxiliary driver
registration.
I have a couple of questions regarding error handling in this update.
Looking at mlx5_eswitch_register_vport_reps_locked(), the return value of
mlx5_esw_add_work() is not checked. Since mlx5_esw_add_work() uses
kzalloc_obj() with GFP_ATOMIC, it can fail under memory pressure and return
-ENOMEM. If this allocation fails, the background work to load the
representors is never queued, but the representors are already marked as
REP_REGISTERED. Could this result in a NULL pointer dereference if callers
attempt to access the representor data under the assumption that it was
successfully initialized?
Additionally, in mlx5_eswitch_reload_reps_blocked(), failures from
mlx5_esw_offloads_rep_load() appear to be silently ignored. If the call
fails for MLX5_VPORT_UPLINK, the function aborts early without loading the
other vports. Within the loop for the remaining vports, the return value is
also ignored. Does this leave the system in a partially loaded and
inconsistent state, where representors are registered but uninitialized for
some vports?
"
REP_REGISTERED does not imply initialized representor data. Users that need
private data only proceed on REP_LOADED, and failed loads leave the state
REP_REGISTERED. The queued reload is for late rep-ops registration while already
in switchdev, where there is no synchronous error path back to the auxiliary
probe, so it is intentionally opportunistic. The strict path is still the
switchdev transition: there, representor load errors are propagated and the
mode change is aborted, so this does not change the consistency model.
Mark
> .../mellanox/mlx5/core/eswitch_offloads.c | 34 +++++++++++++++++++
> 1 file changed, 34 insertions(+)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
> index 8f656253981b..f26d1652dd05 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
> @@ -4563,6 +4563,38 @@ mlx5_eswitch_register_vport_reps_blocked(struct mlx5_eswitch *esw,
> }
> }
>
> +static void mlx5_eswitch_reload_reps_blocked(struct mlx5_eswitch *esw)
> +{
> + struct mlx5_vport *vport;
> + unsigned long i;
> +
> + if (esw->mode != MLX5_ESWITCH_OFFLOADS)
> + return;
> +
> + if (mlx5_esw_offloads_rep_load(esw, MLX5_VPORT_UPLINK))
> + return;
> +
> + mlx5_esw_for_each_vport(esw, i, vport) {
> + if (!vport)
> + continue;
> + if (!vport->enabled)
> + continue;
> + if (vport->vport == MLX5_VPORT_UPLINK)
> + continue;
> + if (!mlx5_eswitch_vport_has_rep(esw, vport->vport))
> + continue;
> +
> + mlx5_esw_offloads_rep_load(esw, vport->vport);
> + }
> +}
> +
> +static void mlx5_eswitch_reload_reps(struct mlx5_eswitch *esw)
> +{
> + mlx5_esw_reps_block(esw);
> + mlx5_eswitch_reload_reps_blocked(esw);
> + mlx5_esw_reps_unblock(esw);
> +}
> +
> static void
> mlx5_eswitch_register_vport_reps_locked(struct mlx5_eswitch *esw,
> const struct mlx5_eswitch_rep_ops *ops,
> @@ -4574,6 +4606,8 @@ mlx5_eswitch_register_vport_reps_locked(struct mlx5_eswitch *esw,
> mlx5_esw_reps_block(esw);
> mlx5_eswitch_register_vport_reps_blocked(esw, ops, rep_type);
> mlx5_esw_reps_unblock(esw);
> +
> + mlx5_esw_add_work(esw, mlx5_eswitch_reload_reps);
> }
>
> void mlx5_eswitch_register_vport_reps(struct mlx5_eswitch *esw,