On Wed, Feb 19, 2025 at 9:11 PM Wang Hai <wanghai38@xxxxxxxxxx> wrote:Hi Jason,
On 2025/2/19 11:31, Jason Xing wrote:
On Wed, Feb 19, 2025 at 10:16 AM Wang Hai <wanghai38@xxxxxxxxxx> wrote:Yes, because it didn't switch to TCP_ESTABLISHED
On 2025/2/18 21:35, Eric Dumazet wrote:
On Tue, Feb 18, 2025 at 12:00 PM Wang Hai <wanghai38@xxxxxxxxxx> wrote:Hi Eric,
If two ack packets from a connection enter tcp_check_req at the same time
through different cpu, it may happen that req->ts_recent is updated with
with a more recent time and the skb with an older time creates a new sock,
which will cause the tcp_validate_incoming check to fail.
cpu1 cpu2
tcp_check_req
tcp_check_req
req->ts_recent = tmp_opt.rcv_tsval = t1
req->ts_recent = tmp_opt.rcv_tsval = t2
newsk->ts_recent = req->ts_recent = t2 // t1 < t2
tcp_child_process
tcp_rcv_state_process
tcp_validate_incoming
tcp_paws_check
if ((s32)(rx_opt->ts_recent - rx_opt->rcv_tsval) <= paws_win) // failed
In tcp_check_req, restore ts_recent to this skb's to fix this bug.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Wang Hai <wanghai38@xxxxxxxxxx>
---
net/ipv4/tcp_minisocks.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index b089b08e9617..0208455f9eb8 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -878,6 +878,10 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
sock_rps_save_rxhash(child, skb);
tcp_synack_rtt_meas(child, req);
*req_stolen = !own_req;
+ if (own_req && tcp_sk(child)->rx_opt.tstamp_ok &&
+ unlikely(tcp_sk(child)->rx_opt.ts_recent != tmp_opt.rcv_tsval))
+ tcp_sk(child)->rx_opt.ts_recent = tmp_opt.rcv_tsval;
+
return inet_csk_complete_hashdance(sk, child, req, own_req);
Have you seen the comment at line 818 ?
/* TODO: We probably should defer ts_recent change once
* we take ownership of @req.
*/
Plan was clear and explained. Why implement something else (and buggy) ?
Currently we have a real problem, so we want to solve it. This bug
causes the upper layers to be unable to be notified to call accept after
the successful three-way handshake.
Skb from cpu1 that fails at tcp_paws_check (which it could have
succeeded) will not be able to enter the TCP_ESTABLISHED state, and
therefore parent->sk_data_ready(parent) will not be triggered, and skb
from cpu2 can complete the three-way handshake, but there is also no way
to call parent->sk_data_ready(parent) to notify the upper layer, which
will result
in the upper layer not being able to sense and call accept to obtain the
nsk.
cpu1 cpu2
tcp_check_req
tcp_check_req
req->ts_recent = tmp_opt.rcv_tsval = t1
req->ts_recent=tmp_opt.rcv_tsval= t2
newsk->ts_recent = req->ts_recent = t2 // t1 < t2
tcp_child_process
tcp_rcv_state_process
tcp_validate_incoming
tcp_paws_check // failed
parent->sk_data_ready(parent); // will not be called
tcp_v4_do_rcv
tcp_rcv_state_process // Complete the three-way handshake
// missing parent->sk_data_ready(parent);
IIUC, the ack received from cpu1 triggered calling
inet_csk_complete_hashdance() so its state transited from
TCP_NEW_SYN_RECV to TCP_SYN_RECV, right? If so, the reason why not
call sk_data_ready() if the skb entered into tcp_child_process() is
that its state failed to transit to TCP_ESTABLISHED?
Here is another question. How did the skb on the right side enter intoSince cpu1 successfully created the child sock, cpu2 will return
tcp_v4_do_rcv() after entering tcp_check_req() if the state of sk
which the skb belongs to is TCP_NEW_SYN_RECV? Could you elaborate more
on this point?
null in tcp_check_req and req_stolen is set to true, so that it will
subsequently go to 'goto lookup' to re-process the packet, and at
this point, sk->sk_state is already in TCP_SYN_RECV state, and then
then tcp_v4_do_rcv is called.
Now I can see what happened there. Perhaps it would be good to update
the commit message
in the next iteration.
Yes, I have considered this fix before, but the complexity of the fix would be higher.
Another key information I notice is that the second lookup process
loses the chance to call sk_data_ready() for its parent socket. It's
the one of the main reasons that cause your application to be unable
to get notified. Taking a rough look at tcp_rcv_state_process(), I
think it's not easy to acquire the parent socket there and then call
sk_data_ready() without modifying more codes compared to the current
solution. It's a different solution in theory.
I'm sorry, I may not have understood what you meant. Applying my fix, commit[1] is still necessary because it doesn't solve the bug that commit[1] fixes. can you explain in detail, why commit[1] will be reverted/overwritten.
If your new approach (like your previous reply) works, the following
commit[1] will be reverted/overwritten.