Re: [RFC PATCH v2 0/3] l3mdev icmp error route lookup fixes

From: David Ahern
Date: Mon Sep 21 2020 - 15:12:04 EST


On 9/21/20 12:44 PM, Mathieu Desnoyers wrote:
> ----- On Sep 21, 2020, at 2:36 PM, David Ahern dsahern@xxxxxxxxx wrote:
>
>> On 9/18/20 12:17 PM, Mathieu Desnoyers wrote:
>>> Hi,
>>>
>>> Here is an updated series of fixes for ipv4 and ipv6 which which ensure
>>> the route lookup is performed on the right routing table in VRF
>>> configurations when sending TTL expired icmp errors (useful for
>>> traceroute).
>>>
>>> It includes tests for both ipv4 and ipv6.
>>>
>>> These fixes address specifically address the code paths involved in
>>> sending TTL expired icmp errors. As detailed in the individual commit
>>> messages, those fixes do not address similar issues related to network
>>> namespaces and unreachable / fragmentation needed messages, which appear
>>> to use different code paths.
>>>
>>
>> New selftests are failing:
>> TEST: Ping received ICMP frag needed [FAIL]
>>
>> Both IPv4 and IPv6 versions are failing.
>
> Indeed, this situation is discussed in each patch commit message:
>
> ipv4:
>
> [ It has also been pointed out that a similar issue exists with
> unreachable / fragmentation needed messages, which can be triggered by
> changing the MTU of eth1 in r1 to 1400 and running:
>
> ip netns exec h1 ping -s 1450 -Mdo -c1 172.16.2.2
>
> Some investigation points to raw_icmp_error() and raw_err() as being
> involved in this last scenario. The focus of this patch is TTL expired
> ICMP messages, which go through icmp_route_lookup.
> Investigation of failure modes related to raw_icmp_error() is beyond
> this investigation's scope. ]
>
> ipv6:
>
> [ Testing shows that similar issues exist with ipv6 unreachable /
> fragmentation needed messages. However, investigation of this
> additional failure mode is beyond this investigation's scope. ]
>
> I do not have the time to investigate further unfortunately, so I
> thought it best to post what I have.
>

the test setup is bad. You have r1 dropping the MTU in VRF red, but not
telling VRF red how to send back the ICMP. e.g., for IPv4 add:

ip -netns r1 ro add vrf red 172.16.1.0/24 dev blue

do the same for v6.

Also, I do not see a reason for r2; I suggest dropping it. What you are
testing is icmp crossing VRF with route leaking, so there should not be
a need for r2 which leads to asymmetrical routing (172.16.1.0 via r1 and
the return via r2).