Re: [PATCH net 1/2] ovpn: tcp - use cached peer pointer in ovpn_tcp_close()

From: David CARLIER

Date: Tue May 12 2026 - 00:56:53 EST


Hi Eric,

On Tue, 12 May 2026 at 05:29, Eric Dumazet <edumazet@xxxxxxxxxx> wrote:
>
> On Mon, May 11, 2026 at 9:20 PM David Carlier <devnexen@xxxxxxxxx> wrote:
> >
> > ovpn_tcp_close() loads the ovpn_socket via rcu_dereference_sk_user_data()
> > under rcu_read_lock(), takes a reference on sock->peer, caches the peer
> > pointer in a local, and drops the read lock. It then passes sock->peer
> > (rather than the cached local) to ovpn_peer_del(), re-dereferencing the
> > ovpn_socket after the RCU read section has ended.
> >
> > Unlike ovpn_tcp_sendmsg(), which uses the same "load under RCU, use
> > after unlock" pattern but is protected by lock_sock() held across the
> > function, ovpn_tcp_close() runs without the socket lock: inet_release()
> > invokes sk_prot->close() without taking lock_sock first.
> >
> > ovpn_socket_release() can therefore complete its kref_put -> detach ->
> > synchronize_rcu -> kfree(sock) sequence concurrently, in the window
> > after ovpn_tcp_close() drops rcu_read_lock() but before it dereferences
> > sock->peer. The synchronize_rcu() in ovpn_socket_release() protects
> > readers that use the dereferenced pointer inside the RCU read section,
> > not those that escape the pointer to a local and use it afterwards.
> >
> > A reproducer follows the pattern of commit 94560267d6c4 ("ovpn: tcp -
> > don't deref NULL sk_socket member after tcp_close()"): trigger a peer
> > removal (keepalive expiration or netlink OVPN_CMD_DEL_PEER) at the same
> > moment userspace closes the TCP fd. That commit fixed the detach-side
> > of the same race window; this one fixes the close-side at a different
> > victim.
> >
> > Use the already-loaded peer local, which is held by the
> > ovpn_peer_hold() taken under RCU and is the correct argument anyway.
> > The remaining lines in the function already use peer; switching this
> > call makes the function consistent and removes the dangling sock
> > dereference.
> >
> > Fixes: 11851cbd60ea ("ovpn: implement TCP transport")
> > Assisted-by: Claude:claude-opus-4-7
> > Signed-off-by: David Carlier <devnexen@xxxxxxxxx>
> > ---
> > drivers/net/ovpn/tcp.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/ovpn/tcp.c b/drivers/net/ovpn/tcp.c
> > index 65054cc84be5..ed4782de141a 100644
> > --- a/drivers/net/ovpn/tcp.c
> > +++ b/drivers/net/ovpn/tcp.c
> > @@ -588,7 +588,7 @@ static void ovpn_tcp_close(struct sock *sk, long timeout)
> > peer = sock->peer;
> > rcu_read_unlock();
> >
> > - ovpn_peer_del(sock->peer, OVPN_DEL_PEER_REASON_TRANSPORT_DISCONNECT);
> > + ovpn_peer_del(peer, OVPN_DEL_PEER_REASON_TRANSPORT_DISCONNECT);
> > peer->tcp.sk_cb.prot->close(sk, timeout);
> > ovpn_peer_put(peer);
> > }
>
> I do not see how rcu_read_lock() can protect sock->peer changes.
>
> I think we need to be more careful, in the prior code which reads
> sock->peer 3 times.
>
> If RCU is used, we better use it properly.
>
> rcu_read_lock();
> sock = rcu_dereference_sk_user_data(sk);
> if (!sock || !sock->peer || !ovpn_peer_hold(sock->peer)) {
> rcu_read_unlock();
> return;
> }
> peer = sock->peer;
> rcu_read_unlock();

You're right. sock->peer is only assigned once, in ovpn_socket_new()
under lock_sock() before the ovpn_socket is even published via
rcu_assign_sk_user_data(), and never touched again - which is why the
three reads happen to be stable. But that's the kind of invariant the
next person has to go dig up, so the pattern is doing nobody any
favours.

v2 tomorrow with sock->peer read once into the local up front, and
all the subsequent uses going through that.

Same multi-read pattern shows up in ovpn_tcp_recvmsg(),
ovpn_tcp_sendmsg(), ovpn_tcp_data_ready() and ovpn_tcp_write_space()
- happy to roll those into v2 as well, or punt to a follow-up,
whichever you'd prefer.

Thanks,
David