Re: [PATCH v1 bpf-next 05/11] tcp: Migrate TCP_NEW_SYN_RECV requests.

From: Kuniyuki Iwashima
Date: Thu Dec 03 2020 - 09:14:29 EST


From: Eric Dumazet <eric.dumazet@xxxxxxxxx>
Date: Tue, 1 Dec 2020 16:13:39 +0100
> On 12/1/20 3:44 PM, Kuniyuki Iwashima wrote:
> > This patch renames reuseport_select_sock() to __reuseport_select_sock() and
> > adds two wrapper function of it to pass the migration type defined in the
> > previous commit.
> >
> > reuseport_select_sock : BPF_SK_REUSEPORT_MIGRATE_NO
> > reuseport_select_migrated_sock : BPF_SK_REUSEPORT_MIGRATE_REQUEST
> >
> > As mentioned before, we have to select a new listener for TCP_NEW_SYN_RECV
> > requests at receiving the final ACK or sending a SYN+ACK. Therefore, this
> > patch also changes the code to call reuseport_select_migrated_sock() even
> > if the listening socket is TCP_CLOSE. If we can pick out a listening socket
> > from the reuseport group, we rewrite request_sock.rsk_listener and resume
> > processing the request.
> >
> > Reviewed-by: Benjamin Herrenschmidt <benh@xxxxxxxxxx>
> > Signed-off-by: Kuniyuki Iwashima <kuniyu@xxxxxxxxxxxx>
> > ---
> > include/net/inet_connection_sock.h | 12 +++++++++++
> > include/net/request_sock.h | 13 ++++++++++++
> > include/net/sock_reuseport.h | 8 +++----
> > net/core/sock_reuseport.c | 34 ++++++++++++++++++++++++------
> > net/ipv4/inet_connection_sock.c | 13 ++++++++++--
> > net/ipv4/tcp_ipv4.c | 9 ++++++--
> > net/ipv6/tcp_ipv6.c | 9 ++++++--
> > 7 files changed, 81 insertions(+), 17 deletions(-)
> >
> > diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h
> > index 2ea2d743f8fc..1e0958f5eb21 100644
> > --- a/include/net/inet_connection_sock.h
> > +++ b/include/net/inet_connection_sock.h
> > @@ -272,6 +272,18 @@ static inline void inet_csk_reqsk_queue_added(struct sock *sk)
> > reqsk_queue_added(&inet_csk(sk)->icsk_accept_queue);
> > }
> >
> > +static inline void inet_csk_reqsk_queue_migrated(struct sock *sk,
> > + struct sock *nsk,
> > + struct request_sock *req)
> > +{
> > + reqsk_queue_migrated(&inet_csk(sk)->icsk_accept_queue,
> > + &inet_csk(nsk)->icsk_accept_queue,
> > + req);
> > + sock_put(sk);
> > + sock_hold(nsk);
>
> This looks racy to me. nsk refcount might be zero at this point.
>
> If you think it can _not_ be zero, please add a big comment here,
> because this would mean something has been done before reaching this function,
> and this sock_hold() would be not needed in the first place.
>
> There is a good reason reqsk_alloc() is using refcount_inc_not_zero().

Exactly, I will fix this in the next spin like below.
Thank you.

---8<---
diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h
index 1e0958f5eb21..d8c3be31e987 100644
--- a/include/net/inet_connection_sock.h
+++ b/include/net/inet_connection_sock.h
@@ -280,7 +280,6 @@ static inline void inet_csk_reqsk_queue_migrated(struct sock *sk,
&inet_csk(nsk)->icsk_accept_queue,
req);
sock_put(sk);
- sock_hold(nsk);
req->rsk_listener = nsk;
}

diff --git a/net/core/sock_reuseport.c b/net/core/sock_reuseport.c
index 6b475897b496..4d07bddcf678 100644
--- a/net/core/sock_reuseport.c
+++ b/net/core/sock_reuseport.c
@@ -386,7 +386,14 @@ EXPORT_SYMBOL(reuseport_select_sock);
struct sock *reuseport_select_migrated_sock(struct sock *sk, u32 hash,
struct sk_buff *skb)
{
- return __reuseport_select_sock(sk, hash, skb, 0, BPF_SK_REUSEPORT_MIGRATE_REQUEST);
+ struct sock *nsk;
+
+ nsk = __reuseport_select_sock(sk, hash, skb, 0, BPF_SK_REUSEPORT_MIGRATE_REQUEST);
+ if (IS_ERR_OR_NULL(nsk) ||
+ unlikely(!refcount_inc_not_zero(&nsk->sk_refcnt)))
+ return NULL;
+
+ return nsk;
}
EXPORT_SYMBOL(reuseport_select_migrated_sock);

---8<---


> > + req->rsk_listener = nsk;
> > +}
> > +
>
> Honestly, this patch series looks quite complex, and finding a bug in the
> very first function I am looking at is not really a good sign...

I also think this issue is quite complex, but it might be easier to fix
than it was disscussed in 2015 [1] thanks to your some refactoring.

https://lore.kernel.org/netdev/1443313848-751-1-git-send-email-tolga.ceylan@xxxxxxxxx/