Re: [PATCH net-next v20 12/25] ovpn: implement TCP transport

From: Antonio Quartulli
Date: Mon Mar 03 2025 - 10:49:22 EST


On 03/03/2025 16:08, Sabrina Dubroca wrote:
2025-02-27, 02:21:37 +0100, Antonio Quartulli wrote:
@@ -94,11 +96,23 @@ void ovpn_socket_release(struct ovpn_peer *peer)
* detached before it can be picked by a concurrent reader.
*/
lock_sock(sock->sock->sk);
- ovpn_socket_put(peer, sock);
+ released = ovpn_socket_put(peer, sock);
release_sock(sock->sock->sk);
/* align all readers with sk_user_data being NULL */
synchronize_rcu();
+
+ /* following cleanup should happen with lock released */
+ if (released) {
+ if (sock->sock->sk->sk_protocol == IPPROTO_UDP) {
+ netdev_put(sock->ovpn->dev, &sock->dev_tracker);
+ } else if (sock->sock->sk->sk_protocol == IPPROTO_TCP) {
+ /* wait for TCP jobs to terminate */
+ ovpn_tcp_socket_wait_finish(sock);
+ ovpn_peer_put(sock->peer);
+ }
+ kfree_rcu(sock, rcu);

kfree_rcu after synchronize_rcu is a bit unexpected. Do we still need
to wait before we free sock, now that we have synchronize_rcu before?

Ah good point. Waiting one RCU period is what kfree_rcu() is there for, hence we could just call kfree() at this point.


+ }
}



+static int ovpn_tcp_parse(struct strparser *strp, struct sk_buff *skb)
+{
+ struct strp_msg *rxm = strp_msg(skb);
+ __be16 blen;
+ u16 len;
+ int err;
+
+ /* when packets are written to the TCP stream, they are prepended with
+ * two bytes indicating the actual packet size.
+ * Here we read those two bytes and move the skb data pointer to the
+ * beginning of the packet

There's no update to skb->data being done in ovpn_tcp_parse AFAICT.

I concur :)


[...]
+static void ovpn_tcp_rcv(struct strparser *strp, struct sk_buff *skb)
+{
+ struct ovpn_peer *peer = container_of(strp, struct ovpn_peer, tcp.strp);
+ struct strp_msg *msg = strp_msg(skb);
+ size_t pkt_len = msg->full_len - 2;
+ size_t off = msg->offset + 2;
+ u8 opcode;
+
+ /* ensure skb->data points to the beginning of the openvpn packet */
+ if (!pskb_pull(skb, off)) {

Is that the one you mean in the previous comment?

Yes. The comment was probably placed there in a previous version/prototype and never moved.

I'll fix that.


+ net_warn_ratelimited("%s: packet too small for peer %u\n",
+ netdev_name(peer->ovpn->dev), peer->id);
+ goto err;
+ }
[some checks]
+ /* DATA_V2 packets are handled in kernel, the rest goes to user space */
+ opcode = ovpn_opcode_from_skb(skb, 0);
+ if (unlikely(opcode != OVPN_DATA_V2)) {
+ if (opcode == OVPN_DATA_V1) {
+ net_warn_ratelimited("%s: DATA_V1 detected on the TCP stream\n",
+ netdev_name(peer->ovpn->dev));
+ goto err;

In TCP encap, receiving OVPN_DATA_V1 packets is going to kill the peer:

+err:
+ dev_core_stats_rx_dropped_inc(peer->ovpn->dev);
+ kfree_skb(skb);
+ ovpn_peer_del(peer, OVPN_DEL_PEER_REASON_TRANSPORT_ERROR);
+}
+

but that's not the case with the UDP encap (ovpn_udp_encap_recv simply
drops those packets). Should the TCP/UDP behavior be consistent?

Ideally a UDP DATA_V1 could be just a replayed/spoofed packet, so killing the peer doesn't sound good. It'd be a very easy attack.

Doing the same in TCP is much much harder (if practical at all) and in that case it'd be impossible to understand what's happening on the stream, so we just abort.

I think any TCP connection (without TCP-MD5) that gets messed up this way will abort.






+void ovpn_tcp_send_skb(struct ovpn_peer *peer, struct socket *sock,
+ struct sk_buff *skb)
+{
+ u16 len = skb->len;
+
+ *(__be16 *)__skb_push(skb, sizeof(u16)) = htons(len);
+
+ spin_lock_nested(&sock->sk->sk_lock.slock, OVPN_TCP_DEPTH_NESTING);

With this, lockdep is still going to complain in the unlikely case
that ovpn-TCP traffic is carried over another ovpn-TCP socket, right?
(probably fine to leave it like that)

Yeah.
I am not sure how much complexity we'd need to workaround that.
I also assume it's fine to keep it this way (this is also what L2TP does).



[...]
+static int ovpn_tcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
+{
+ struct ovpn_socket *sock;
+ int ret, linear = PAGE_SIZE;
+ struct ovpn_peer *peer;
+ struct sk_buff *skb;
+
+ lock_sock(sk);
+ rcu_read_lock();
+ sock = rcu_dereference_sk_user_data(sk);
+ if (unlikely(!sock || !sock->peer || !ovpn_peer_hold(sock->peer))) {
+ rcu_read_unlock();
+ release_sock(sk);
+ return -EIO;
+ }
+ rcu_read_unlock();
+ peer = sock->peer;

This used to be done under RCU in previous versions of the series. Why
is it after rcu_read_unlock now? (likely safe since we're under
lock_sock so detach can't happen)

Yeah, while restructuring I assumed it could be taken out of the RCU read section and so I kept it out.


+
+ if (msg->msg_flags & ~MSG_DONTWAIT) {
+ ret = -EOPNOTSUPP;
+ goto peer_free;
+ }



--
Antonio Quartulli
OpenVPN Inc.