On Mon. 10 Mar 2025 at 03:59, Oliver Hartkopp <socketcan@xxxxxxxxxxxx> wrote:
value changed: 0x0000000000002b9d -> 0x0000000000002b9e
Increased by '1' ...
I assume this problem is caused by increasing the per-netdevice statistic in
https://elixir.bootlin.com/linux/v6.13.6/source/net/can/af_can.c#L289
pkg_stats->tx_frames++;
pkg_stats->tx_frames_delta++;
We update the statistics for the device and in this specific case the
hrtimer fired on two CPUs resulting in a can_send() to the same netdevice.
Do you agree with this quick analysis?
Ack. Same conclusion here.
Isn't there some lock-less per-cpu safe statistic handling within netdev
we might pick for our use-case?
I see two solutions. Either we use lock_sock(skb->sk) and
release_sock(skb->sk) or we can change the types of
can_pkg_stats->tx_frames and can_pkg_stats->tx_frames_delta from long
to atomic_long_t.
The atomic_long_t is the closest solution to a lock-less. But my
preference goes to the lock_sock() which looks more natural in this
context. And look_sock() is just a spinlock which under the hood is
also an atomic, so no big penalty either.