Re: [PATCH v1 1/1] Fix race for duplicate reqsk on identical SYN

From: Eric Dumazet
Date: Fri Jun 14 2024 - 05:42:04 EST


On Fri, Jun 14, 2024 at 10:59 AM luoxuanqiang <luoxuanqiang@xxxxxxxxxx> wrote:
>
> On Fri, Jun 14, 2024 at 8:01 AM luoxuanqiang <luoxuanqiang@xxxxxxxxxx> wrote:
> >>
> >> When bonding is configured in BOND_MODE_BROADCAST mode, if two identical SYN packets
> >> are received at the same time and processed on different CPUs, it can potentially
> >> create the same sk (sock) but two different reqsk (request_sock) in tcp_conn_request().
> >>
> >> These two different reqsk will respond with two SYNACK packets, and since the generation
> >> of the seq (ISN) incorporates a timestamp, the final two SYNACK packets will have
> >> different seq values.
> >>
> >> The consequence is that when the Client receives and replies with an ACK to the earlier
> >> SYNACK packet, we will reset(RST) it.
> >>
> >> ====================================================================================
> >>
> >> This behavior is consistently reproducible in my local setup, which comprises:
> >>
> >> | NETA1 ------ NETB1 |
> >> PC_A --- bond --- | | --- bond --- PC_B
> >> | NETA2 ------ NETB2 |
> >>
> >> - PC_A is the Server and has two network cards, NETA1 and NETA2. I have bonded these two
> >> cards using BOND_MODE_BROADCAST mode and configured them to be handled by different CPU.
> >>
> >> - PC_B is the Client, also equipped with two network cards, NETB1 and NETB2, which are
> >> also bonded and configured in BOND_MODE_BROADCAST mode.
> >>
> >> If the client attempts a TCP connection to the server, it might encounter a failure.
> >> Capturing packets from the server side reveals:
> >>
> >> 10.10.10.10.45182 > localhost.localdomain.search-agent: Flags [S], seq 320236027,
> >> 10.10.10.10.45182 > localhost.localdomain.search-agent: Flags [S], seq 320236027,
> >> localhost.localdomain.search-agent > 10.10.10.10.45182: Flags [S.], seq 2967855116,
> >> localhost.localdomain.search-agent > 10.10.10.10.45182: Flags [S.], seq 2967855123, <==
> >> 10.10.10.10.45182 > localhost.localdomain.search-agent: Flags [.], ack 4294967290,
> >> 10.10.10.10.45182 > localhost.localdomain.search-agent: Flags [.], ack 4294967290,
> >> localhost.localdomain.search-agent > 10.10.10.10.45182: Flags [R], seq 2967855117, <==
> >> localhost.localdomain.search-agent > 10.10.10.10.45182: Flags [R], seq 2967855117,
> >>
> >> Two SYNACKs with different seq numbers are sent by localhost, resulting in an anomaly.
> >>
> >> ====================================================================================
> >>
> >> The attempted solution is as follows:
> >> In the tcp_conn_request(), while inserting reqsk into the ehash table, it also checks
> >> if an entry already exists. If found, it avoids reinsertion and releases it.
> >>
> >> Simultaneously, In the reqsk_queue_hash_req(), the start of the req->rsk_timer is
> >> adjusted to be after successful insertion.
> >>
> >> Signed-off-by: luoxuanqiang <luoxuanqiang@xxxxxxxxxx>
> >> ---
> >> include/net/inet_connection_sock.h | 2 +-
> >> net/dccp/ipv4.c | 2 +-
> >> net/dccp/ipv6.c | 2 +-
> >> net/ipv4/inet_connection_sock.c | 16 ++++++++++++----
> >> net/ipv4/tcp_input.c | 11 ++++++++++-
> >> 5 files changed, 25 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h
> >> index 7d6b1254c92d..8773d161d184 100644
> >> --- a/include/net/inet_connection_sock.h
> >> +++ b/include/net/inet_connection_sock.h
> >> @@ -264,7 +264,7 @@ struct sock *inet_csk_reqsk_queue_add(struct sock *sk,
> >> struct request_sock *req,
> >> struct sock *child);
> >> void inet_csk_reqsk_queue_hash_add(struct sock *sk, struct request_sock *req,
> >> - unsigned long timeout);
> >> + unsigned long timeout, bool *found_dup_sk);
> >> struct sock *inet_csk_complete_hashdance(struct sock *sk, struct sock *child,
> >> struct request_sock *req,
> >> bool own_req);
> >> diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
> >> index ff41bd6f99c3..13aafdeb9205 100644
> >> --- a/net/dccp/ipv4.c
> >> +++ b/net/dccp/ipv4.c
> >> @@ -657,7 +657,7 @@ int dccp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
> >> if (dccp_v4_send_response(sk, req))
> >> goto drop_and_free;
> >>
> >> - inet_csk_reqsk_queue_hash_add(sk, req, DCCP_TIMEOUT_INIT);
> >> + inet_csk_reqsk_queue_hash_add(sk, req, DCCP_TIMEOUT_INIT, NULL);
> >> reqsk_put(req);
> >> return 0;
> >>
> >> diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
> >> index 85f4b8fdbe5e..493cdb12ce2b 100644
> >> --- a/net/dccp/ipv6.c
> >> +++ b/net/dccp/ipv6.c
> >> @@ -400,7 +400,7 @@ static int dccp_v6_conn_request(struct sock *sk, struct sk_buff *skb)
> >> if (dccp_v6_send_response(sk, req))
> >> goto drop_and_free;
> >>
> >> - inet_csk_reqsk_queue_hash_add(sk, req, DCCP_TIMEOUT_INIT);
> >> + inet_csk_reqsk_queue_hash_add(sk, req, DCCP_TIMEOUT_INIT, NULL);
> >> reqsk_put(req);
> >> return 0;
> >>
> >> diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> >> index d81f74ce0f02..d9394db98a5a 100644
> >> --- a/net/ipv4/inet_connection_sock.c
> >> +++ b/net/ipv4/inet_connection_sock.c
> >> @@ -1123,12 +1123,17 @@ static void reqsk_timer_handler(struct timer_list *t)
> >> }
> >>
> >> static void reqsk_queue_hash_req(struct request_sock *req,
> >> - unsigned long timeout)
> >> + unsigned long timeout, bool *found_dup_sk)
> >> {
> >> +
> >> + inet_ehash_insert(req_to_sk(req), NULL, found_dup_sk);
> >> + if(found_dup_sk && *found_dup_sk)
> >> + return;
> >> +
> >> + /* The timer needs to be setup after a successful insertion. */
> >
> >I am pretty sure we had a prior attempt to fix this issue, and the fix
> >was problematic.
> >
> >You are moving the inet_ehash_insert() before the mod_timer(), this
> >will add races.
> Could you kindly explain what "races" refer to here? Thank you!


Hmmm... maybe this is ok. Please respin your patch after fixing
checkpatch issues, and add a 'net' tag

( See https://patchwork.kernel.org/project/netdevbpf/patch/20240614060012.158026-1-luoxuanqiang@xxxxxxxxxx/
for all warnings / fails)

Please CC Kuniyuki Iwashima <kuniyu@xxxxxxxxxx> because I will be OOO
for about 4 days.