Re: [PATCH net] tcp: Fix error ts_recent time during three-way handshake

From: Wang Hai
Date: Thu Feb 20 2025 - 09:05:55 EST




On 2025/2/20 11:04, Jason Xing wrote:
On Wed, Feb 19, 2025 at 9:11 PM Wang Hai <wanghai38@xxxxxxxxxx> wrote:



On 2025/2/19 11:31, Jason Xing wrote:
On Wed, Feb 19, 2025 at 10:16 AM Wang Hai <wanghai38@xxxxxxxxxx> wrote:



On 2025/2/18 21:35, Eric Dumazet wrote:
On Tue, Feb 18, 2025 at 12:00 PM Wang Hai <wanghai38@xxxxxxxxxx> wrote:

If two ack packets from a connection enter tcp_check_req at the same time
through different cpu, it may happen that req->ts_recent is updated with
with a more recent time and the skb with an older time creates a new sock,
which will cause the tcp_validate_incoming check to fail.

cpu1 cpu2
tcp_check_req
tcp_check_req
req->ts_recent = tmp_opt.rcv_tsval = t1
req->ts_recent = tmp_opt.rcv_tsval = t2

newsk->ts_recent = req->ts_recent = t2 // t1 < t2
tcp_child_process
tcp_rcv_state_process
tcp_validate_incoming
tcp_paws_check
if ((s32)(rx_opt->ts_recent - rx_opt->rcv_tsval) <= paws_win) // failed

In tcp_check_req, restore ts_recent to this skb's to fix this bug.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Wang Hai <wanghai38@xxxxxxxxxx>
---
net/ipv4/tcp_minisocks.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index b089b08e9617..0208455f9eb8 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -878,6 +878,10 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
sock_rps_save_rxhash(child, skb);
tcp_synack_rtt_meas(child, req);
*req_stolen = !own_req;
+ if (own_req && tcp_sk(child)->rx_opt.tstamp_ok &&
+ unlikely(tcp_sk(child)->rx_opt.ts_recent != tmp_opt.rcv_tsval))
+ tcp_sk(child)->rx_opt.ts_recent = tmp_opt.rcv_tsval;
+
return inet_csk_complete_hashdance(sk, child, req, own_req);

Have you seen the comment at line 818 ?

/* TODO: We probably should defer ts_recent change once
* we take ownership of @req.
*/

Plan was clear and explained. Why implement something else (and buggy) ?

Hi Eric,

Currently we have a real problem, so we want to solve it. This bug
causes the upper layers to be unable to be notified to call accept after
the successful three-way handshake.

Skb from cpu1 that fails at tcp_paws_check (which it could have
succeeded) will not be able to enter the TCP_ESTABLISHED state, and
therefore parent->sk_data_ready(parent) will not be triggered, and skb
from cpu2 can complete the three-way handshake, but there is also no way
to call parent->sk_data_ready(parent) to notify the upper layer, which
will result
in the upper layer not being able to sense and call accept to obtain the
nsk.

cpu1 cpu2
tcp_check_req
tcp_check_req
req->ts_recent = tmp_opt.rcv_tsval = t1
req->ts_recent=tmp_opt.rcv_tsval= t2

newsk->ts_recent = req->ts_recent = t2 // t1 < t2
tcp_child_process
tcp_rcv_state_process
tcp_validate_incoming
tcp_paws_check // failed
parent->sk_data_ready(parent); // will not be called
tcp_v4_do_rcv
tcp_rcv_state_process // Complete the three-way handshake
// missing parent->sk_data_ready(parent);

IIUC, the ack received from cpu1 triggered calling
inet_csk_complete_hashdance() so its state transited from
TCP_NEW_SYN_RECV to TCP_SYN_RECV, right? If so, the reason why not
call sk_data_ready() if the skb entered into tcp_child_process() is
that its state failed to transit to TCP_ESTABLISHED?

Yes, because it didn't switch to TCP_ESTABLISHED
Here is another question. How did the skb on the right side enter into
tcp_v4_do_rcv() after entering tcp_check_req() if the state of sk
which the skb belongs to is TCP_NEW_SYN_RECV? Could you elaborate more
on this point?
Since cpu1 successfully created the child sock, cpu2 will return
null in tcp_check_req and req_stolen is set to true, so that it will
subsequently go to 'goto lookup' to re-process the packet, and at
this point, sk->sk_state is already in TCP_SYN_RECV state, and then
then tcp_v4_do_rcv is called.

Now I can see what happened there. Perhaps it would be good to update
the commit message
in the next iteration.
Hi Jason,

Thanks for the suggestion, I'll test it out and improve the commit message to send v2.

Another key information I notice is that the second lookup process
loses the chance to call sk_data_ready() for its parent socket. It's
the one of the main reasons that cause your application to be unable
to get notified. Taking a rough look at tcp_rcv_state_process(), I
think it's not easy to acquire the parent socket there and then call
sk_data_ready() without modifying more codes compared to the current
solution. It's a different solution in theory.
Yes, I have considered this fix before, but the complexity of the fix would be higher.

If your new approach (like your previous reply) works, the following
commit[1] will be reverted/overwritten.
I'm sorry, I may not have understood what you meant. Applying my fix, commit[1] is still necessary because it doesn't solve the bug that commit[1] fixes. can you explain in detail, why commit[1] will be reverted/overwritten.

Thanks,
Wang