[bug report] Potential refcounting issues in 'drivers/net/ethernet/mellanox/mlx4/srq.c', between 'mlx4_srq_event()' and 'mlx4_srq_free()'
From: Ginger
Date: Thu Apr 23 2026 - 04:05:34 EST
Dear Linux kernel maintainers,
My research-based static analyzer found a potential atomicity bug
within the 'drivers/net/ethernet/mellanox/mlx4' subsystem, more
specifically, in 'drivers/net/ethernet/mellanox/mlx4/srq.c'.
Kernel version: long-term kernel v6.18.9
Potential concurrent triggering executions:
T0:
mlx4_srq_free
--> spin_lock_irq(&srq_table->lock);
--> radix_tree_delete(&srq_table->tree, srq->srqn);
--> spin_unlock_irq(&srq_table->lock);
--> if (refcount_dec_and_test(&srq->refcount))
T1:
mlx4_srq_event
--> rcu_read_lock();
--> srq = radix_tree_lookup(&srq_table->tree, srqn &
(dev->caps.num_srqs - 1));
--> rcu_read_unlock();
--> refcount_inc(&srq->refcount);
--> if (refcount_dec_and_test(&srq->refcount))
In T1, the refcounting increment on 'srq->refcount' does not check
whether this value has already reached zero in T0. In that case, if
the refcount already reaches zero, then the first 'refcount_inc()'
will increment it to one and the subsequent 'if
(refcount_dec_and_test(&srq->refcount))' will test to true, resulting
an additional call to 'complete(&srq->free)'.
This is potentially problematic for mlx4 NICs.
Thank you for your time and consideration.
Best regards,
Ginger