Re: [RFC PATCH v2 0/3] l3mdev icmp error route lookup fixes

From: Michael Jeanson
Date: Tue Sep 22 2020 - 10:00:13 EST


----- On 21 Sep, 2020, at 15:33, Mathieu Desnoyers mathieu.desnoyers@xxxxxxxxxxxx wrote:

> ----- On Sep 21, 2020, at 3:11 PM, David Ahern dsahern@xxxxxxxxx wrote:
>
>> On 9/21/20 12:44 PM, Mathieu Desnoyers wrote:
>>> ----- On Sep 21, 2020, at 2:36 PM, David Ahern dsahern@xxxxxxxxx wrote:
>>>
>>>> On 9/18/20 12:17 PM, Mathieu Desnoyers wrote:
>>>>> Hi,
>>>>>
>>>>> Here is an updated series of fixes for ipv4 and ipv6 which which ensure
>>>>> the route lookup is performed on the right routing table in VRF
>>>>> configurations when sending TTL expired icmp errors (useful for
>>>>> traceroute).
>>>>>
>>>>> It includes tests for both ipv4 and ipv6.
>>>>>
>>>>> These fixes address specifically address the code paths involved in
>>>>> sending TTL expired icmp errors. As detailed in the individual commit
>>>>> messages, those fixes do not address similar issues related to network
>>>>> namespaces and unreachable / fragmentation needed messages, which appear
>>>>> to use different code paths.
>>>>>
>>>>
>>>> New selftests are failing:
>>>> TEST: Ping received ICMP frag needed [FAIL]
>>>>
>>>> Both IPv4 and IPv6 versions are failing.
>>>
>>> Indeed, this situation is discussed in each patch commit message:
>>>
>>> ipv4:
>>>
>>> [ It has also been pointed out that a similar issue exists with
>>> unreachable / fragmentation needed messages, which can be triggered by
>>> changing the MTU of eth1 in r1 to 1400 and running:
>>>
>>> ip netns exec h1 ping -s 1450 -Mdo -c1 172.16.2.2
>>>
>>> Some investigation points to raw_icmp_error() and raw_err() as being
>>> involved in this last scenario. The focus of this patch is TTL expired
>>> ICMP messages, which go through icmp_route_lookup.
>>> Investigation of failure modes related to raw_icmp_error() is beyond
>>> this investigation's scope. ]
>>>
>>> ipv6:
>>>
>>> [ Testing shows that similar issues exist with ipv6 unreachable /
>>> fragmentation needed messages. However, investigation of this
>>> additional failure mode is beyond this investigation's scope. ]
>>>
>>> I do not have the time to investigate further unfortunately, so I
>>> thought it best to post what I have.
>>>
>>
>> the test setup is bad. You have r1 dropping the MTU in VRF red, but not
>> telling VRF red how to send back the ICMP. e.g., for IPv4 add:
>>
>> ip -netns r1 ro add vrf red 172.16.1.0/24 dev blue
>>
>> do the same for v6.
>>
>> Also, I do not see a reason for r2; I suggest dropping it. What you are
>> testing is icmp crossing VRF with route leaking, so there should not be
>> a need for r2 which leads to asymmetrical routing (172.16.1.0 via r1 and
>> the return via r2).

The objective of the test was to replicate a clients environment where
packets are crossing from a VRF which has a route back to the source to
one which doesn't while reaching a ttl of 0. If the route lookup for the
icmp error is done on the interface in the first VRF, it can be routed to
the source but not on the interface in the second VRF which is the
current behaviour for icmp errors generated while crossing between VRFs.

There may be a better test case that doesn't involve asymmetric routing
to test this but it's the only way I found to replicate this.