Re: [PATCH net-next v11 09/23] ovpn: implement basic RX path (UDP)

From: Antonio Quartulli
Date: Sun Dec 01 2024 - 18:39:32 EST


On 29/11/2024 17:10, Sabrina Dubroca wrote:
2024-11-26, 02:32:38 +0200, Sergey Ryazanov wrote:
On 15.11.2024 17:02, Antonio Quartulli wrote:
On 11/11/2024 02:54, Sergey Ryazanov wrote:
[...]
+    skb_reset_transport_header(skb);
+    skb_probe_transport_header(skb);
+    skb_reset_inner_headers(skb);
+
+    memset(skb->cb, 0, sizeof(skb->cb));

Why do we need to zero the control buffer here?

To avoid the next layer to assume the cb is clean while it is not.
Other drivers do the same as well.

AFAIR, there is no convention to clean the control buffer before the handing
over. The common practice is a bit opposite, programmer shall not assume
that the control buffer has been zeroed.

Not a big deal to clean it here, we just can save some CPU cycles avoiding
it.

I think this was recommended by Sabrina as well.

Curious. It's macsec that does not zero it, or I've not understood how it
was done.

I only remember discussing a case [1] where one function within ovpn
was expecting a cleared skb->cb to behave correctly but the caller did
not clear it. In general, as you said, clearing cb "to be nice to
other layers" is not expected. Sorry if some comments I made were
confusing.

No problem at all.
I misunderstood some statement and went the wrong route.
Thanks a lot Sergey for pointing this out.

I am only clearing the cb before usage as required by internal assumptions.


[1] https://lore.kernel.org/netdev/ZtXOw-NcL9lvwWa8@hog


+struct ovpn_struct *ovpn_from_udp_sock(struct sock *sk)
+{
+    struct ovpn_socket *ovpn_sock;
+
+    if (unlikely(READ_ONCE(udp_sk(sk)->encap_type) !=
UDP_ENCAP_OVPNINUDP))
+        return NULL;
+
+    ovpn_sock = rcu_dereference_sk_user_data(sk);
+    if (unlikely(!ovpn_sock))
+        return NULL;
+
+    /* make sure that sk matches our stored transport socket */
+    if (unlikely(!ovpn_sock->sock || sk != ovpn_sock->sock->sk))
+        return NULL;
+
+    return ovpn_sock->ovpn;

Now, returning of this pointer is safe. But the following TCP
transport support calls the socket release via a scheduled work.
What extends socket lifetime and makes it possible to receive a UDP
packet way after the interface private data release. Is it correct
assumption?

Sorry you lost me when sayng "following *TCP* transp[ort support calls".
This function is invoked only in UDP context.
Was that a typ0?

Yeah, you are right. The question sounds like a riddle. I should eventually
stop composing emails at midnight. Let me paraphrase it.

The potential issue is tricky since we create it patch-by-patch.

Up to this patch the socket releasing procedure looks solid and reliable.
E.g. the P2P netdev destroying:

ovpn_netdev_notifier_call(NETDEV_UNREGISTER)
ovpn_peer_release_p2p
ovpn_peer_del_p2p
ovpn_peer_put
ovpn_peer_release_kref
ovpn_peer_release
ovpn_socket_put
ovpn_socket_release_kref
ovpn_socket_detach
ovpn_udp_socket_detach
setup_udp_tunnel_sock
netdev_run_todo
rcu_barrier <- no running ovpn_udp_encap_recv after this point

It's more the synchronize_net in unregister_netdevice_many_notify?
rcu_barrier waits for pending kfree_rcu/call_rcu, synchronize_rcu
waits for rcu_read_lock sections (see the comments for rcu_barrier and
synchronize_rcu in kernel/rcu/tree.c).

free_netdev

After the setup_udp_tunnel_sock() call no new ovpn_udp_encap_recv() will be
spawned. And after the rcu_barrier() all running ovpn_udp_encap_recv() will
be done. All good.

Then, the following patch 'ovpn: implement TCP transport' disjoin
ovpn_socket_release_kref() and ovpn_socket_detach() by scheduling the socket
detach function call:

ovpn_socket_release_kref
ovpn_socket_schedule_release
schedule_work(&sock->work)

And long time after the socket will be actually detached:

ovpn_socket_release_work
ovpn_socket_detach
ovpn_udp_socket_detach
setup_udp_tunnel_sock

And until this detaching will take a place, UDP handler can call
ovpn_udp_encap_recv() whatever number of times.

So, we can end up with this scenario:

ovpn_netdev_notifier_call(NETDEV_UNREGISTER)
ovpn_peer_release_p2p
ovpn_peer_del_p2p
ovpn_peer_put
ovpn_peer_release_kref
ovpn_peer_release
ovpn_socket_put
ovpn_socket_release_kref
ovpn_socket_schedule_release
schedule_work(&sock->work)
netdev_run_todo
rcu_barrier
free_netdev

ovpn_udp_encap_recv <- called for an incoming UDP packet
ovpn_from_udp_sock <- returns pointer to freed memory
// Any access to ovpn pointer is the use-after-free

ovpn_socket_release_work <- kernel finally ivoke the work
ovpn_socket_detach
ovpn_udp_socket_detach
setup_udp_tunnel_sock

To address the issue, I see two possible solutions:
1. flush the workqueue somewhere before the netdev release
2. set ovpn_sock->ovpn = NULL before scheduling the socket detach

Going with #2, we could fully split detach into a synchronous part and
async part (with async not needed for UDP). detach_sync clears the
pointers (CBs, strp_stop(), ovpn_sock->ovpn, setup_udp_tunnel_sock) so
that no more packets will be sent through the ovpn driver.

Related to that topic, I'm not sure what's keeping a reference on the
peer to guarantee it doesn't get freed before we're done with
peer->tcp.tx_work at the end of ovpn_tcp_socket_detach. Maybe all this
tcp stuff should move from the peer to ovpn_socket?

Good point.
It may make sense to move everything to ovpn_socket and avoid this extra dependency on the peer, while it is not needed at all.

I will play with it and see what comes out.

Thanks!

Regards,



--
Antonio Quartulli
OpenVPN Inc.