Re: [PATCH net] net/mlx5e: Fix oops from ERR_PTR in act-miss restore teardown

From: Jakub Kicinski

Date: Fri Jun 12 2026 - 21:43:06 EST


This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
net/mlx5e: Fix oops from ERR_PTR in act-miss restore teardown

This change clears attr->act_id_restore_rule to NULL after
esw_add_restore_rule() fails, so the teardown path no longer
dereferences an ERR_PTR cookie as if it were a real mlx5_flow_handle.

> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> index a9001d1c902f..4c135858f297 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> @@ -5863,6 +5863,7 @@ int mlx5e_tc_action_miss_mapping_get(struct mlx5e_priv *priv, struct mlx5_flow_a
> attr->act_id_restore_rule = esw_add_restore_rule(esw, *act_miss_mapping);
> if (IS_ERR(attr->act_id_restore_rule)) {
> err = PTR_ERR(attr->act_id_restore_rule);
> + attr->act_id_restore_rule = NULL;
> goto err_rule;
> }

[High]
The commit message states that after this change "teardown only sees
NULL or a valid handle", framing NULL as benign. Is that invariant
actually enforced on the consumer side?

Looking at the teardown helper:

drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:mlx5e_tc_action_miss_mapping_put() {
struct mapping_ctx *ctx = mlx5e_get_priv_obj_mapping(priv);

if (is_mdev_switchdev_mode(priv->mdev))
mlx5_del_flow_rules(attr->act_id_restore_rule);
mapping_remove(ctx, act_miss_mapping);
}

There is no NULL check before mlx5_del_flow_rules(), and
mlx5_del_flow_rules() itself dereferences handle->rule[0] without a
NULL check:

drivers/net/ethernet/mellanox/mlx5/core/fs_core.c:mlx5_del_flow_rules() {
struct fs_fte *fte;
int i;
...
fs_get_obj(fte, handle->rule[0]->node.parent);
down_write_ref_node(&fte->node, false);
...
}

So on any path where mlx5e_tc_action_miss_mapping_put() is reached with
attr->act_id_restore_rule == NULL in switchdev mode, this patch
converts the original ERR_PTR dereference into a NULL dereference at
the same site rather than avoiding the crash.

Today the put is gated by attr->ct_attr.offloaded in
mlx5_tc_ct_delete_flow(), and __mlx5_tc_ct_flow_offload() leaves
offloaded clear when mlx5e_tc_action_miss_mapping_get() fails, so the
specific oops in the call trace should no longer be reachable. But
should the put be made symmetric with the new "NULL or valid" producer
contract, e.g.:

if (is_mdev_switchdev_mode(priv->mdev) && attr->act_id_restore_rule)
mlx5_del_flow_rules(attr->act_id_restore_rule);

so that any future caller that relies on the documented invariant
(including mlx5e_clone_flow_attr_for_post_act() in en_tc.c, which
already inherits this field) does not reintroduce the same oops with a
slightly different fault address?
--
pw-bot: cr