Re: [linus:master] [selftests] 8ae9efb859: kernel-selftests.net.fib_tests.sh.fail

From: Ido Schimmel
Date: Sun Oct 01 2023 - 10:51:06 EST


On Mon, Sep 25, 2023 at 06:18:34PM +0000, Sriram Yagnaraman wrote:
> CC: Ido, who helped a lot with writing these tests.
>
> > -----Original Message-----
> > From: kernel test robot <oliver.sang@xxxxxxxxx>
> > Sent: Tuesday, 19 September 2023 10:32
> > To: Sriram Yagnaraman <sriram.yagnaraman@xxxxxxxx>
> > Cc: oe-lkp@xxxxxxxxxxxxxxx; lkp@xxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; David
> > S. Miller <davem@xxxxxxxxxxxxx>; netdev@xxxxxxxxxxxxxxx;
> > oliver.sang@xxxxxxxxx
> > Subject: [linus:master] [selftests] 8ae9efb859: kernel-
> > selftests.net.fib_tests.sh.fail
> >
> >
> > hi, Sriram Yagnaraman,
> >
> > we noticed two new added tests failed in our test environment.
> > want to consult with you what's the dependency and requirement to run
> > them?
> > Thanks a lot!
>
> Sorry for the delayed response. I will look at this and get back.
> I am not an expert with lkp-tests but will try to set it up on my local environment and reproduce the problem.
>
> >
> > Hello,
> >
> > kernel test robot noticed "kernel-selftests.net.fib_tests.sh.fail" on:
> >
> > commit: 8ae9efb859c05a54ac92b3336c6ca0597c9c8cdb ("selftests: fib_tests:
> > Add multipath list receive tests")
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> >
> > in testcase: kernel-selftests
> > version: kernel-selftests-x86_64-60acb023-1_20230329
> > with following parameters:
> >
> > group: net
> >
> >
> >
> > compiler: gcc-12
> > test machine: 36 threads 1 sockets Intel(R) Core(TM) i9-10980XE CPU @
> > 3.00GHz (Cascade Lake) with 32G memory
> >
> > (please refer to attached dmesg/kmsg for entire log/backtrace)
> >
> >
> >
> >
> > If you fix the issue in a separate patch/commit (i.e. not just a new version of the
> > same patch/commit), kindly add following tags
> > | Reported-by: kernel test robot <oliver.sang@xxxxxxxxx>
> > | Closes:
> > | https://lore.kernel.org/oe-lkp/202309191658.c00d8b8-oliver.sang@intel.
> > | com
> >
> >
> >
> > # timeout set to 1500
> > # selftests: net: fib_tests.sh
> > #
> > # Single path route test
> > # Start point
> > # TEST: IPv4 fibmatch [ OK ]
> > # TEST: IPv6 fibmatch [ OK ]
> > # Nexthop device deleted
> > # TEST: IPv4 fibmatch - no route [ OK ]
> > # TEST: IPv6 fibmatch - no route [ OK ]
> >
> > ...
> >
> > #
> > # Fib6 garbage collection test
> > # TEST: ipv6 route garbage collection [ OK ]
> > #
> > # IPv4 multipath list receive tests
> > # TEST: Multipath route hit ratio (.06) [FAIL]
> > #
> > # IPv6 multipath list receive tests
> > # TEST: Multipath route hit ratio (.10) [FAIL]

I found two possible problems. The first is that in the IPv4 case we
might get more trace point hits than packets (ratio higher than 1)
because of the additional FIB lookups for source validation. Fixed by
disabling source validation:

diff --git a/tools/testing/selftests/net/fib_tests.sh b/tools/testing/selftests/net/fib_tests.sh
index e7d2a530618a..66d0db7a2614 100755
--- a/tools/testing/selftests/net/fib_tests.sh
+++ b/tools/testing/selftests/net/fib_tests.sh
@@ -2437,6 +2437,9 @@ ipv4_mpath_list_test()
run_cmd "ip -n ns2 route add 203.0.113.0/24
nexthop via 172.16.201.2 nexthop via 172.16.202.2"
run_cmd "ip netns exec ns2 sysctl -qw net.ipv4.fib_multipath_hash_policy=1"
+ run_cmd "ip netns exec ns2 sysctl -qw net.ipv4.conf.veth2.rp_filter=0"
+ run_cmd "ip netns exec ns2 sysctl -qw net.ipv4.conf.all.rp_filter=0"
+ run_cmd "ip netns exec ns2 sysctl -qw net.ipv4.conf.default.rp_filter=0"
set +e

local dmac=$(ip -n ns2 -j link show dev veth2 | jq -r '.[]["address"]')

The second problem (which I believe is the one you encountered) is that
we might miss certain trace point hits if they happen from the ksoftirqd
task instead of the mausezahn task. Fixed by:

@@ -2449,7 +2452,7 @@ ipv4_mpath_list_test()
# words, the FIB lookup tracepoint needs to be triggered for every
# packet.
local t0_rx_pkts=$(link_stats_get ns2 veth2 rx packets)
- run_cmd "perf stat -e fib:fib_table_lookup --filter 'err == 0' -j -o $tmp_file -- $cmd"
+ run_cmd "perf stat -a -e fib:fib_table_lookup --filter 'err == 0' -j -o $tmp_file -- $cmd"
local t1_rx_pkts=$(link_stats_get ns2 veth2 rx packets)
local diff=$(echo $t1_rx_pkts - $t0_rx_pkts | bc -l)
list_rcv_eval $tmp_file $diff
@@ -2494,7 +2497,7 @@ ipv6_mpath_list_test()
# words, the FIB lookup tracepoint needs to be triggered for every
# packet.
local t0_rx_pkts=$(link_stats_get ns2 veth2 rx packets)
- run_cmd "perf stat -e fib6:fib6_table_lookup --filter 'err == 0' -j -o $tmp_file -- $cmd"
+ run_cmd "perf stat -a -e fib6:fib6_table_lookup --filter 'err == 0' -j -o $tmp_file -- $cmd"
local t1_rx_pkts=$(link_stats_get ns2 veth2 rx packets)
local diff=$(echo $t1_rx_pkts - $t0_rx_pkts | bc -l)
list_rcv_eval $tmp_file $diff

Ran both tests in a loop:

# for i in $(seq 1 20); do ./fib_tests.sh -t ipv4_mpath_list; done
# for i in $(seq 1 20); do ./fib_tests.sh -t ipv6_mpath_list; done

And verified that the results are stable. Also verified that the tests
reliably fail when reverting both fixes:

8423be8926aa ipv6: ignore dst hint for multipath routes
6ac66cb03ae3 ipv4: ignore dst hint for multipath routes

Can you please test with the proposed modifications?

Thanks