The potential issue is tricky since we create it patch-by-patch.
Up to this patch the socket releasing procedure looks solid and reliable. E.g. the P2P netdev destroying:
ovpn_netdev_notifier_call(NETDEV_UNREGISTER)
ovpn_peer_release_p2p
ovpn_peer_del_p2p
ovpn_peer_put
ovpn_peer_release_kref
ovpn_peer_release
ovpn_socket_put
ovpn_socket_release_kref
ovpn_socket_detach
ovpn_udp_socket_detach
setup_udp_tunnel_sock
netdev_run_todo
rcu_barrier <- no running ovpn_udp_encap_recv after this point
free_netdev
After the setup_udp_tunnel_sock() call no new ovpn_udp_encap_recv() will be spawned. And after the rcu_barrier() all running ovpn_udp_encap_recv() will be done. All good.
ok
Then, the following patch 'ovpn: implement TCP transport' disjoin ovpn_socket_release_kref() and ovpn_socket_detach() by scheduling the socket detach function call:
ovpn_socket_release_kref
ovpn_socket_schedule_release
schedule_work(&sock->work)
And long time after the socket will be actually detached:
ovpn_socket_release_work
ovpn_socket_detach
ovpn_udp_socket_detach
setup_udp_tunnel_sock
And until this detaching will take a place, UDP handler can call ovpn_udp_encap_recv() whatever number of times.
So, we can end up with this scenario:
ovpn_netdev_notifier_call(NETDEV_UNREGISTER)
ovpn_peer_release_p2p
ovpn_peer_del_p2p
ovpn_peer_put
ovpn_peer_release_kref
ovpn_peer_release
ovpn_socket_put
ovpn_socket_release_kref
ovpn_socket_schedule_release
schedule_work(&sock->work)
netdev_run_todo
rcu_barrier
free_netdev
ovpn_udp_encap_recv <- called for an incoming UDP packet
ovpn_from_udp_sock <- returns pointer to freed memory
// Any access to ovpn pointer is the use-after-free
ovpn_socket_release_work <- kernel finally ivoke the work
ovpn_socket_detach
ovpn_udp_socket_detach
setup_udp_tunnel_sock
To address the issue, I see two possible solutions:
1. flush the workqueue somewhere before the netdev release
yes! This is what I was missing. This will also solve the "how can the module wait for all workers to be done before unloading?"