Re: [PATCH] tcp: fix "old stuff" D-SACK causing SACK to be treated as D-SACK

From: Eric Dumazet
Date: Thu Jan 02 2020 - 08:15:59 EST




On 1/1/20 11:48 PM, æéç wrote:
> Hi Eric Dumazet,
>
> I'm sorry there was a slight error in the packetdrill test case of the previous email reply,
> the ACK segment should not carry data, although this does not affect the description of the situation.
> I fixed the packetdrill test and resent it as follows:
>
> packetdrill test case:
> // Verify the "old stuff" D-SACK causing SACK to be treated as D-SACK
> --tolerance_usecs=10000
>
> // enable RACK and TLP
> 0 `sysctl -q net.ipv4.tcp_recovery=1; sysctl -q net.ipv4.tcp_early_retrans=3`
>
> // Establish a connection, rtt = 10ms
> +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
> +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
> +0 bind(3, ..., ...) = 0
> +0 listen(3, 1) = 0
>
> +.1 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
> +0 > S. 0:0(0) ack 1 <...>
> +.01 < . 1:1(0) ack 1 win 320
> +0 accept(3, ..., ...) = 4
>
> // send 10 data segments
> +0 write(4, ..., 10000) = 10000
> +0 > P. 1:10001(10000) ack 1
>
> // send TLP
> +.02 > P. 9001:10001(1000) ack 1
>
> // enter recovery and retransmit 1:1001, now undo_marker = 1
> +.015 < . 1:1(0) ack 1 win 320 <sack 9001:10001, nop, nop>
> +0 > . 1:1001(1000) ack 1
>
> // ack 1:1001 and retransmit 1001:3001
> +.01 < . 1:1(0) ack 1001 win 320 <sack 9001:10001, nop, nop>
> +0 > . 1001:3001(2000) ack 1
>
> // sack 2001:3001, now 2001:3001 has R|S
> +.01 < . 1:1(0) ack 1001 win 320 <sack 2001:3001 9001:10001, nop, nop>
>
> +0 %{ assert tcpi_reordering == 3, tcpi_reordering }%
>
> // d-sack 1:1001, satisfies: undo_marker(1) <= start_seq < end_seq <= prior_snd_una(1001)
> // BUG: 2001:3001 is treated as D-SACK then reordering is modified in tcp_sacktag_one()
> +0 < . 1:1(0) ack 1001 win 320 <sack 1:1001 2001:3001 9001:10001, nop, nop>
>
> // reordering was modified to 8
> +0 %{ assert tcpi_reordering == 3, tcpi_reordering }%
>
>

Very nice, thanks a lot for this test !




>
> -----éäåä-----
> åää: æéç <yangpc@xxxxxxxxxx>
> åéæé: 2020å1æ1æ 19:47
> æää: 'Eric Dumazet' <edumazet@xxxxxxxxxx>
> æé: 'David Miller' <davem@xxxxxxxxxxxxx>; 'Alexey Kuznetsov' <kuznet@xxxxxxxxxxxxx>; 'Hideaki YOSHIFUJI' <yoshfuji@xxxxxxxxxxxxxx>; 'Alexei Starovoitov' <ast@xxxxxxxxxx>; 'Daniel Borkmann' <daniel@xxxxxxxxxxxxx>; 'Martin KaFai Lau' <kafai@xxxxxx>; 'Song Liu' <songliubraving@xxxxxx>; 'Yonghong Song' <yhs@xxxxxx>; 'andriin@xxxxxx' <andriin@xxxxxx>; 'netdev' <netdev@xxxxxxxxxxxxxxx>; 'LKML' <linux-kernel@xxxxxxxxxxxxxxx>
> äé: Re: [PATCH] tcp: fix "old stuff" D-SACK causing SACK to be treated as D-SACK
>
> Hi Eric Dumazet,
>
> Thanks for discussing this issue.
>
> 'previous sack segment was lost' means that the SACK segment carried by D-SACK will be processed by tcp_sacktag_one () due to the previous SACK loss, but this is not necessary.
>
> Here is the packetdrill test, this example shows that the reordering was modified because the SACK segment was treated as D-SACK.
>
> //dsack-old-stuff-bug.pkt
> // Verify the "old stuff" D-SACK causing SACK to be treated as D-SACK
> --tolerance_usecs=10000
>
> // enable RACK and TLP
> 0 `sysctl -q net.ipv4.tcp_recovery=1; sysctl -q net.ipv4.tcp_early_retrans=3`
>
> // Establish a connection, rtt = 10ms
> +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
> +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
> +0 bind(3, ..., ...) = 0
> +0 listen(3, 1) = 0
>
> +.1 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
> +0 > S. 0:0(0) ack 1 <...>
> +.01 < . 1:1(0) ack 1 win 320
> +0 accept(3, ..., ...) = 4
>
> // send 10 data segments
> +0 write(4, ..., 10000) = 10000
> +0 > P. 1:10001(10000) ack 1
>
> // send TLP
> +.02 > P. 9001:10001(1000) ack 1
>
> // enter recovery and retransmit 1:1001, now undo_marker = 1
> +.015 < . 1:1(0) ack 1 win 320 <sack 9001:10001, nop, nop>
> +0 > . 1:1001(1000) ack 1
>
> // ack 1:1001 and retransmit 1001:3001
> +.01 < . 1:1001(1000) ack 1001 win 320 <sack 9001:10001, nop, nop>
> +0 > . 1001:3001(2000) ack 1
>
> // sack 2001:3001, now 2001:3001 has R|S
> +.01 < . 1001:1001(0) ack 1001 win 320 <sack 2001:3001 9001:10001, nop, nop>
>
> +0 %{ assert tcpi_reordering == 3, tcpi_reordering }%
>
> // d-sack 1:1001, satisfies: undo_marker(1) <= start_seq < end_seq <= prior_snd_una(1001) // BUG: 2001:3001 is treated as D-SACK then reordering is modified in tcp_sacktag_one()
> +0 < . 1001:1001(0) ack 1001 win 320 <sack 1:1001 2001:3001 9001:10001, nop, nop>
>
> // reordering was modified to 8
> +0 %{ assert tcpi_reordering == 3, tcpi_reordering }%
>
>
>
>
> -----éäåä-----
> åää: Eric Dumazet <edumazet@xxxxxxxxxx>
> åéæé: 2019å12æ30æ 21:41
> æää: Pengcheng Yang <yangpc@xxxxxxxxxx>
> æé: David Miller <davem@xxxxxxxxxxxxx>; Alexey Kuznetsov <kuznet@xxxxxxxxxxxxx>; Hideaki YOSHIFUJI <yoshfuji@xxxxxxxxxxxxxx>; Alexei Starovoitov <ast@xxxxxxxxxx>; Daniel Borkmann <daniel@xxxxxxxxxxxxx>; Martin KaFai Lau <kafai@xxxxxx>; Song Liu <songliubraving@xxxxxx>; Yonghong Song <yhs@xxxxxx>; andriin@xxxxxx; netdev <netdev@xxxxxxxxxxxxxxx>; LKML <linux-kernel@xxxxxxxxxxxxxxx>
> äé: Re: [PATCH] tcp: fix "old stuff" D-SACK causing SACK to be treated as D-SACK
>
> On Mon, Dec 30, 2019 at 1:55 AM Pengcheng Yang <yangpc@xxxxxxxxxx> wrote:
>>
>> When we receive a D-SACK, where the sequence number satisfies:
>> undo_marker <= start_seq < end_seq <= prior_snd_una we
>> consider this is a valid D-SACK and tcp_is_sackblock_valid() returns
>> true, then this D-SACK is discarded as "old stuff", but the variable
>> first_sack_index is not marked as negative in
>> tcp_sacktag_write_queue().
>>
>> If this D-SACK also carries a SACK that needs to be processed (for
>> example, the previous SACK segment was lost),
>
> What do you mean by ' previous sack segment was lost' ?
>
> this SACK
>> will be treated as a D-SACK in the following processing of
>> tcp_sacktag_write_queue(), which will eventually lead to incorrect
>> updates of undo_retrans and reordering.
>>
>> Fixes: fd6dad616d4f ("[TCP]: Earlier SACK block verification &
>> simplify access to them")
>> Signed-off-by: Pengcheng Yang <yangpc@xxxxxxxxxx>
>> ---
>> net/ipv4/tcp_input.c | 5 ++++-
>> 1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index
>> 88b987c..0238b55 100644
>> --- a/net/ipv4/tcp_input.c
>> +++ b/net/ipv4/tcp_input.c
>> @@ -1727,8 +1727,11 @@ static int tcp_sack_cache_ok(const struct tcp_sock *tp, const struct tcp_sack_bl
>> }
>>
>> /* Ignore very old stuff early */
>> - if (!after(sp[used_sacks].end_seq, prior_snd_una))
>> + if (!after(sp[used_sacks].end_seq, prior_snd_una)) {
>> + if (i == 0)
>> + first_sack_index = -1;
>> continue;
>> + }
>>
>> used_sacks++;
>> }
>
>
> Hi Pengcheng Yang
>
> This corner case deserves a packetdrill test so that we understand the issue, can you provide one ?
>
> Thanks.
>