Re: [PATCH net 1/2] ovpn: tcp - use cached peer pointer in ovpn_tcp_close()
From: Antonio Quartulli
Date: Tue May 12 2026 - 03:30:03 EST
Hi,
On 12/05/2026 06:56, David CARLIER wrote:
Hi Eric,
On Tue, 12 May 2026 at 05:29, Eric Dumazet <edumazet@xxxxxxxxxx> wrote:
On Mon, May 11, 2026 at 9:20 PM David Carlier <devnexen@xxxxxxxxx> wrote:
ovpn_tcp_close() loads the ovpn_socket via rcu_dereference_sk_user_data()
under rcu_read_lock(), takes a reference on sock->peer, caches the peer
pointer in a local, and drops the read lock. It then passes sock->peer
(rather than the cached local) to ovpn_peer_del(), re-dereferencing the
ovpn_socket after the RCU read section has ended.
Unlike ovpn_tcp_sendmsg(), which uses the same "load under RCU, use
after unlock" pattern but is protected by lock_sock() held across the
function, ovpn_tcp_close() runs without the socket lock: inet_release()
invokes sk_prot->close() without taking lock_sock first.
ovpn_socket_release() can therefore complete its kref_put -> detach ->
synchronize_rcu -> kfree(sock) sequence concurrently, in the window
after ovpn_tcp_close() drops rcu_read_lock() but before it dereferences
sock->peer. The synchronize_rcu() in ovpn_socket_release() protects
readers that use the dereferenced pointer inside the RCU read section,
not those that escape the pointer to a local and use it afterwards.
A reproducer follows the pattern of commit 94560267d6c4 ("ovpn: tcp -
don't deref NULL sk_socket member after tcp_close()"): trigger a peer
removal (keepalive expiration or netlink OVPN_CMD_DEL_PEER) at the same
moment userspace closes the TCP fd. That commit fixed the detach-side
of the same race window; this one fixes the close-side at a different
victim.
Use the already-loaded peer local, which is held by the
ovpn_peer_hold() taken under RCU and is the correct argument anyway.
The remaining lines in the function already use peer; switching this
call makes the function consistent and removes the dangling sock
dereference.
Fixes: 11851cbd60ea ("ovpn: implement TCP transport")
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: David Carlier <devnexen@xxxxxxxxx>
---
drivers/net/ovpn/tcp.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ovpn/tcp.c b/drivers/net/ovpn/tcp.c
index 65054cc84be5..ed4782de141a 100644
--- a/drivers/net/ovpn/tcp.c
+++ b/drivers/net/ovpn/tcp.c
@@ -588,7 +588,7 @@ static void ovpn_tcp_close(struct sock *sk, long timeout)
peer = sock->peer;
rcu_read_unlock();
- ovpn_peer_del(sock->peer, OVPN_DEL_PEER_REASON_TRANSPORT_DISCONNECT);
+ ovpn_peer_del(peer, OVPN_DEL_PEER_REASON_TRANSPORT_DISCONNECT);
peer->tcp.sk_cb.prot->close(sk, timeout);
ovpn_peer_put(peer);
}
I do not see how rcu_read_lock() can protect sock->peer changes.
I think we need to be more careful, in the prior code which reads
sock->peer 3 times.
If RCU is used, we better use it properly.
rcu_read_lock();
sock = rcu_dereference_sk_user_data(sk);
if (!sock || !sock->peer || !ovpn_peer_hold(sock->peer)) {
rcu_read_unlock();
return;
}
peer = sock->peer;
rcu_read_unlock();
You're right. sock->peer is only assigned once, in ovpn_socket_new()
under lock_sock() before the ovpn_socket is even published via
rcu_assign_sk_user_data(), and never touched again - which is why the
three reads happen to be stable. But that's the kind of invariant the
next person has to go dig up, so the pattern is doing nobody any
favours.
v2 tomorrow with sock->peer read once into the local up front, and
all the subsequent uses going through that.
Same multi-read pattern shows up in ovpn_tcp_recvmsg(),
ovpn_tcp_sendmsg(), ovpn_tcp_data_ready() and ovpn_tcp_write_space()
- happy to roll those into v2 as well, or punt to a follow-up,
whichever you'd prefer.
I have to double check all the mentioned spots, however this sounds more like a larger refactoring for net-next that we should be carefully review, rather than a bugfix.
Regards,
--
Antonio Quartulli
OpenVPN Inc.