Re: [PATCH net] tcp: avoid creating multiple req socks with the same tuples
From: Eric Dumazet
Date: Wed Jun 05 2019 - 10:22:50 EST
On 6/5/19 1:52 AM, Zhiqiang Liu wrote:
>
>
> å 2019/6/4 23:24, Eric Dumazet åé:
>> On Tue, Jun 4, 2019 at 7:47 AM Mao Wenan <maowenan@xxxxxxxxxx> wrote:
>>>
>>> There is one issue about bonding mode BOND_MODE_BROADCAST, and
>>> two slaves with diffierent affinity, so packets will be handled
>>> by different cpu. These are two pre-conditions in this case.
>
>>> Signed-off-by: Mao Wenan <maowenan@xxxxxxxxxx>
>>> --
>>
>> This issue has been discussed last year.
>>
>> I am afraid your patch does not solve all races.
>>
>> The lookup you add is lockless, so this is racy.
>>
>> Really the only way to solve this is to make sure that _when_ the
>> bucket lock is held,
>> we do not insert a request socket if the 4-tuple is already in the
>> chain (probably in inet_ehash_insert())
>>
>> This needs more tricky changes than your patch.
>>
>
> This kind case is rarely used, and the condition of the issue is strict.
> If we add the "lookup" before or in inet_ehash_insert func for each reqsk,
> overall performance will be affected.
>
> We may solve the small probability issue with a trick in the tcp_v4_rcv.
> If the ACK is invalid checked by tcp_check_req func, the req could be dropped,
> and then goto the lookup for searching another avaliable reqsk. In this way,
> the performance will not be affected in the normal process.
>
> The patch is given as following:
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index a2896944aa37..9d0491587ed2 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -1874,8 +1874,10 @@ int tcp_v4_rcv(struct sk_buff *skb)
> goto discard_and_relse;
> }
> if (nsk == sk) {
> - reqsk_put(req);
> + inet_csk_reqsk_queue_drop_and_put(sk, req);
> tcp_v4_restore_cb(skb);
> + sock_put(sk);
> + goto lookup;
> } else if (tcp_child_process(sk, nsk, skb)) {
> tcp_v4_send_reset(nsk, skb);
> goto discard_and_relse;
>
This is not solving the race.
Please read again my prior emails.
If you want to work on this issue, you have to fix it for good.
Thanks.