Re: [PATCH net-next 2/7] net/mlx5: E-Switch, move work queue generation counter

From: Mark Bloch

Date: Thu Apr 09 2026 - 13:58:59 EST

On 09/04/2026 14:55, Tariq Toukan wrote:
> From: Mark Bloch <mbloch@xxxxxxxxxx>
>
> The generation counter in mlx5_esw_functions is used to detect stale
> work items on the E-Switch work queue. Move it from mlx5_esw_functions
> to the top-level mlx5_eswitch struct so it can guard all work types,
> not just function-change events.
>
> This is a mechanical refactor: no behavioral change.
>
> Signed-off-by: Mark Bloch <mbloch@xxxxxxxxxx>
> Reviewed-by: Cosmin Ratiu <cratiu@xxxxxxxxxx>
> Signed-off-by: Tariq Toukan <tariqt@xxxxxxxxxx>
> ---
> drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 3 ++-
> drivers/net/ethernet/mellanox/mlx5/core/eswitch.h | 2 +-
> drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c | 4 ++--
> 3 files changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
> index 123c96716a54..1986d4d0e886 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
> @@ -1075,7 +1075,7 @@ static void mlx5_eswitch_event_handler_unregister(struct mlx5_eswitch *esw)
> if (esw->mode == MLX5_ESWITCH_OFFLOADS &&
> mlx5_eswitch_is_funcs_handler(esw->dev)) {
> mlx5_eq_notifier_unregister(esw->dev, &esw->esw_funcs.nb);
> - atomic_inc(&esw->esw_funcs.generation);
> + atomic_inc(&esw->generation);
> }
> }
>
> @@ -2072,6 +2072,7 @@ int mlx5_eswitch_init(struct mlx5_core_dev *dev)
> mutex_init(&esw->state_lock);
> init_rwsem(&esw->mode_lock);
> refcount_set(&esw->qos.refcnt, 0);
> + atomic_set(&esw->generation, 0);
>
> esw->enabled_vports = 0;
> esw->offloads.inline_mode = MLX5_INLINE_MODE_NONE;
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
> index 5128f5020dae..0c3d2bdebf8c 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
> @@ -340,7 +340,6 @@ struct mlx5_host_work {
>
> struct mlx5_esw_functions {
> struct mlx5_nb nb;
> - atomic_t generation;
> bool host_funcs_disabled;
> u16 num_vfs;
> u16 num_ec_vfs;
> @@ -410,6 +409,7 @@ struct mlx5_eswitch {
> struct mlx5_devcom_comp_dev *devcom;
> u16 enabled_ipsec_vf_count;
> bool eswitch_operation_in_progress;
> + atomic_t generation;
> };
>
> void esw_offloads_disable(struct mlx5_eswitch *esw);
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
> index a078d06f4567..b2e7294d3a5c 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
> @@ -3667,7 +3667,7 @@ esw_vfs_changed_event_handler(struct mlx5_eswitch *esw, int work_gen,
> devl_lock(devlink);
>
> /* Stale work from one or more mode changes ago. Bail out. */
> - if (work_gen != atomic_read(&esw->esw_funcs.generation))
> + if (work_gen != atomic_read(&esw->generation))
> goto unlock;

Sashiko writes:

> Does this generation counter pattern allow stale work to bypass the guard?
> While this issue seems to have existed before this patch, it appears the
> notifier can execute concurrently with the teardown path, creating a race
> condition.
> If a concurrent hardware event triggers mlx5_esw_funcs_changed_handler() on
> another CPU during mode teardown:
> 1. The teardown path calls mlx5_eswitch_event_handler_unregister(), which
> unregisters the notifier and increments esw->generation.
> 2. mlx5_eq_notifier_unregister() removes the callback but does not
> synchronize with currently executing RCU readers.
> 3. The concurrently running notifier reads the newly incremented generation
> number via atomic_read(&esw->generation) and queues host_work with it.
> 4. The worker thread (esw_vfs_changed_event_handler) blocks waiting for
> devl_lock(devlink), which is currently held by the teardown thread.
> 5. Once teardown finishes transitioning to legacy mode and drops the lock,
> the worker resumes and checks the generation.
> 6. Since both the work_gen and esw->generation match the newly incremented
> value, the guard is bypassed.
> Because teardown set esw_funcs.num_vfs = 0 but firmware might still report
> new_num_vfs > 0 (as VFs remain active), the worker calls
> mlx5_eswitch_load_vf_vports(). Could this execute offloads-specific
> initialization while the eswitch is in legacy mode and lead to state
> corruption?

False positive, atomic_notifier_call_chain() runs under rcu
read lock, while atomic_notifier_chain_unregister()
performs a synchronize_rcu() before returning.

Mark
>
> new_num_vfs = MLX5_GET(query_esw_functions_out, out,
> @@ -3729,7 +3729,7 @@ int mlx5_esw_funcs_changed_handler(struct notifier_block *nb, unsigned long type
> esw = container_of(esw_funcs, struct mlx5_eswitch, esw_funcs);
>
> host_work->esw = esw;
> - host_work->work_gen = atomic_read(&esw_funcs->generation);
> + host_work->work_gen = atomic_read(&esw->generation);
>
> INIT_WORK(&host_work->work, esw_functions_changed_event_handler);
> queue_work(esw->work_queue, &host_work->work);