On Thu, 2017-01-12 at 09:22 +0100, Oliver Hartkopp wrote:
But my main concern is:
The reason why can_rx_delete_receiver() was introduced was the need to
remove a huge number of receivers with can_rx_unregister().
When you call synchronize_rcu() after each receiver removal this would
potentially lead to a big performance issue when e.g. closing CAN_RAW
sockets with a high number of receivers.
So the idea was to remove/unlink the receiver hlist_del_rcu(&r->list)
and also kmem_cache_free(rcv_cache, r) by some rcu mechanism - so that
all elements are cleaned up by rcu at a later point.
Is it possible that the problems emerge due to hlist_del_rcu(&r->list)
and you accidently fix it with your introduced synchronize_rcu()?
I agree this patch does not fix the root cause.
The main problem seems that the sockets themselves are not RCU
protected.
If CAN uses RCU for delivery, then sockets should be freed only after
one RCU grace period.
On recent kernels, following patch could help :