Re: [PATCH net] net: ensure all external references are released in deferred skbuffs
From: Ilya Maximets
Date: Wed Jun 22 2022 - 17:13:06 EST
On 6/22/22 21:27, Eric Dumazet wrote:
> On Wed, Jun 22, 2022 at 9:04 PM Eric Dumazet <edumazet@xxxxxxxxxx> wrote:
>>
>> On Wed, Jun 22, 2022 at 8:19 PM Ilya Maximets <i.maximets@xxxxxxx> wrote:
>>>
>>> On 6/22/22 19:03, Eric Dumazet wrote:
>>>> On Wed, Jun 22, 2022 at 6:47 PM Eric Dumazet <edumazet@xxxxxxxxxx> wrote:
>>>>>
>>>>> On Wed, Jun 22, 2022 at 6:39 PM Eric Dumazet <edumazet@xxxxxxxxxx> wrote:
>>>>>>
>>>>>> On Wed, Jun 22, 2022 at 6:29 PM Eric Dumazet <edumazet@xxxxxxxxxx> wrote:
>>>>>>>
>>>>>>> On Wed, Jun 22, 2022 at 4:26 PM Ilya Maximets <i.maximets@xxxxxxx> wrote:
>>>>>>>>
>>>>>>>> On 6/22/22 13:43, Eric Dumazet wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> I tested the patch below and it seems to fix the issue seen
>>>>>>>> with OVS testsuite. Though it's not obvious for me why this
>>>>>>>> happens. Can you explain a bit more?
>>>>>>>
>>>>>>> Anyway, I am not sure we can call nf_reset_ct(skb) that early.
>>>>>>>
>>>>>>> git log seems to say that xfrm check needs to be done before
>>>>>>> nf_reset_ct(skb), I have no idea why.
>>>>>>
>>>>>> Additional remark: In IPv6 side, xfrm6_policy_check() _is_ called
>>>>>> after nf_reset_ct(skb)
>>>>>>
>>>>>> Steffen, do you have some comments ?
>>>>>>
>>>>>> Some context:
>>>>>> commit b59c270104f03960069596722fea70340579244d
>>>>>> Author: Patrick McHardy <kaber@xxxxxxxxx>
>>>>>> Date: Fri Jan 6 23:06:10 2006 -0800
>>>>>>
>>>>>> [NETFILTER]: Keep conntrack reference until IPsec policy checks are done
>>>>>>
>>>>>> Keep the conntrack reference until policy checks have been performed for
>>>>>> IPsec NAT support. The reference needs to be dropped before a packet is
>>>>>> queued to avoid having the conntrack module unloadable.
>>>>>>
>>>>>> Signed-off-by: Patrick McHardy <kaber@xxxxxxxxx>
>>>>>> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx>
>>>>>>
>>>>>
>>>>> Oh well... __xfrm_policy_check() has :
>>>>>
>>>>> nf_nat_decode_session(skb, &fl, family);
>>>>>
>>>>> This answers my questions.
>>>>>
>>>>> This means we are probably missing at least one XFRM check in TCP
>>>>> stack in some cases.
>>>>> (Only after adding this XFRM check we can call nf_reset_ct(skb))
>>>>>
>>>>
>>>> Maybe this will help ?
>>>
>>> I tested this patch and it seems to fix the OVS problem.
>>> I did not test the xfrm part of it.
>>>
>>> Will you post an official patch?
>>
>> Yes I will. I need to double check we do not leak either the req, or the child.
>>
>> Maybe the XFRM check should be done even earlier, on the listening socket ?
>>
>> Or if we assume the SYNACK packet has been sent after the XFRM test
>> has been applied to the SYN,
>> maybe we could just call nf_reset_ct(skb) to lower risk of regressions.
>>
>> With the last patch, it would be strange that we accept the 3WHS and
>> establish a socket,
>> but drop the payload in the 3rd packet...
>
> Ilya, can you test the following patch ?
Tested with OVS and it works fine, the issue doesn't appear.
Still didn't test the xfrm part, as I'm not sure how.
> I think it makes more sense to let XFRM reject the packet earlier, and
> not complete the 3WHS,
> if for some reason this happens.
OK. However, now the patch looks more like two separate fixes.
>
> Thanks !
>
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index fe8f23b95d32ca4a35d05166d471327bc608fa91..da5a3c44c4fb70f1d3ecc596e694a86267f1c44a
> 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -1964,7 +1964,10 @@ int tcp_v4_rcv(struct sk_buff *skb)
> struct sock *nsk;
>
> sk = req->rsk_listener;
> - drop_reason = tcp_inbound_md5_hash(sk, skb,
> + if (!xfrm4_policy_check(sk, XFRM_POLICY_IN, skb))
> + drop_reason = SKB_DROP_REASON_XFRM_POLICY;
> + else
> + drop_reason = tcp_inbound_md5_hash(sk, skb,
> &iph->saddr, &iph->daddr,
> AF_INET, dif, sdif);
> if (unlikely(drop_reason)) {
> @@ -2016,6 +2019,7 @@ int tcp_v4_rcv(struct sk_buff *skb)
> }
> goto discard_and_relse;
> }
> + nf_reset_ct(skb);
> if (nsk == sk) {
> reqsk_put(req);
> tcp_v4_restore_cb(skb);