Re: [PATCH] net/ipv6: repeat route lookup with saddr set for ECMP

From: Maximilian Moehl

Date: Sat Apr 11 2026 - 04:18:31 EST


On Tue Mar 31, 2026 at 2:50 PM CEST, Maximilian Moehl wrote:
> On Mon Mar 30, 2026 at 9:56 AM CEST, Paolo Abeni wrote:
>> On 3/29/26 11:12 AM, Maximilian Moehl wrote:
>>> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
>>> index 8e2a6b28cea7..465fce51d017 100644
>>> --- a/net/ipv6/ip6_output.c
>>> +++ b/net/ipv6/ip6_output.c
>>> @@ -1148,6 +1148,18 @@ static int ip6_dst_lookup_tail(struct net *net, const struct sock *sk,
>>> *dst = NULL;
>>> }
>>>
>>> + /* If ECMP was involved the initial hash was calculted
>>> + * with saddr=:: which can result in instability
>>> + * when it is later re-calculated with the selected
>>> + * saddr. Lookup the route again with the chosen
>>> + * saddr to get a stable result.
>>> + */
>>> + if (fl6->mp_hash) {
>>> + fl6->mp_hash = 0;
>>> + dst_release(*dst);
>>> + *dst = NULL;
>>> + }
>>> +
>>> if (fl6->flowi6_oif)
>>> flags |= RT6_LOOKUP_F_IFACE;
>>> }
>>
>> This apparently breaks ipv6 fib tests (fib_tests.sh):
>>
>> # IPv6 multipath load balance test
>> # TEST: IPv6 multipath loadbalance [FAIL]
>>
>> see
>> https://github.com/linux-netdev/nipa/wiki/How-to-run-netdev-selftests-CI-style
>> on how to reproduce the tests.
>>
>> Also this would deserve additional testcases.
>
> Thank you for the pointer, I will look into the tests.

I've investigated the test failure. The logic I introduced causes the
packet to leave interface 1 with the address of interface 3 and is
therefore not picked up by the TC counter causing the test failure.
IPv4 does not have this issue, neither does it have the issue I'm
trying to fix for IPv6.

>> Without diving much inside the code I have the feeling this change is
>> plugged into the wrong place: multipath selection logic should be
>> encapsulated by fib6_select_path().

I further looked into how IPv4 prevents this issue from occurring.
Initially I thought it was because it does more than one route lookup,
but if I got it right now, it's because of the scoring logic in
fib_select_multipath. It adds one point for a matching hash bucket and
two points for a matching source address. After initially selecting an
outgoing interface, and with that a source address, the flow stays
bound to the initially selected interface, no matter what the hash is
(unless there's a second interface with the same address where the
hash matches, but in that case switching interfaces is probably
fine?).

I will prepare a new patch that addresses this difference so that IPv6
also prefers the outgoing interface with a matching source address
over the hash bucket it would select otherwise.

--
Max