Re: [linus:master] [selftests] 8ae9efb859: kernel-selftests.net.fib_tests.sh.fail
From: Oliver Sang
Date: Sat Oct 07 2023 - 03:23:46 EST
hi, Ido Schimmel,
On Sun, Oct 01, 2023 at 05:50:20PM +0300, Ido Schimmel wrote:
> On Mon, Sep 25, 2023 at 06:18:34PM +0000, Sriram Yagnaraman wrote:
> > CC: Ido, who helped a lot with writing these tests.
> >
> > > -----Original Message-----
> > > From: kernel test robot <oliver.sang@xxxxxxxxx>
> > > Sent: Tuesday, 19 September 2023 10:32
> > > To: Sriram Yagnaraman <sriram.yagnaraman@xxxxxxxx>
> > > Cc: oe-lkp@xxxxxxxxxxxxxxx; lkp@xxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; David
> > > S. Miller <davem@xxxxxxxxxxxxx>; netdev@xxxxxxxxxxxxxxx;
> > > oliver.sang@xxxxxxxxx
> > > Subject: [linus:master] [selftests] 8ae9efb859: kernel-
> > > selftests.net.fib_tests.sh.fail
> > >
> > >
> > > hi, Sriram Yagnaraman,
> > >
> > > we noticed two new added tests failed in our test environment.
> > > want to consult with you what's the dependency and requirement to run
> > > them?
> > > Thanks a lot!
> >
> > Sorry for the delayed response. I will look at this and get back.
> > I am not an expert with lkp-tests but will try to set it up on my local environment and reproduce the problem.
> >
> > >
> > > Hello,
> > >
> > > kernel test robot noticed "kernel-selftests.net.fib_tests.sh.fail" on:
> > >
> > > commit: 8ae9efb859c05a54ac92b3336c6ca0597c9c8cdb ("selftests: fib_tests:
> > > Add multipath list receive tests")
> > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> > >
> > > in testcase: kernel-selftests
> > > version: kernel-selftests-x86_64-60acb023-1_20230329
> > > with following parameters:
> > >
> > > group: net
> > >
> > >
> > >
> > > compiler: gcc-12
> > > test machine: 36 threads 1 sockets Intel(R) Core(TM) i9-10980XE CPU @
> > > 3.00GHz (Cascade Lake) with 32G memory
> > >
> > > (please refer to attached dmesg/kmsg for entire log/backtrace)
> > >
> > >
> > >
> > >
> > > If you fix the issue in a separate patch/commit (i.e. not just a new version of the
> > > same patch/commit), kindly add following tags
> > > | Reported-by: kernel test robot <oliver.sang@xxxxxxxxx>
> > > | Closes:
> > > | https://lore.kernel.org/oe-lkp/202309191658.c00d8b8-oliver.sang@intel.
> > > | com
> > >
> > >
> > >
> > > # timeout set to 1500
> > > # selftests: net: fib_tests.sh
> > > #
> > > # Single path route test
> > > # Start point
> > > # TEST: IPv4 fibmatch [ OK ]
> > > # TEST: IPv6 fibmatch [ OK ]
> > > # Nexthop device deleted
> > > # TEST: IPv4 fibmatch - no route [ OK ]
> > > # TEST: IPv6 fibmatch - no route [ OK ]
> > >
> > > ...
> > >
> > > #
> > > # Fib6 garbage collection test
> > > # TEST: ipv6 route garbage collection [ OK ]
> > > #
> > > # IPv4 multipath list receive tests
> > > # TEST: Multipath route hit ratio (.06) [FAIL]
> > > #
> > > # IPv6 multipath list receive tests
> > > # TEST: Multipath route hit ratio (.10) [FAIL]
>
> I found two possible problems. The first is that in the IPv4 case we
> might get more trace point hits than packets (ratio higher than 1)
> because of the additional FIB lookups for source validation. Fixed by
> disabling source validation:
>
> diff --git a/tools/testing/selftests/net/fib_tests.sh b/tools/testing/selftests/net/fib_tests.sh
> index e7d2a530618a..66d0db7a2614 100755
> --- a/tools/testing/selftests/net/fib_tests.sh
> +++ b/tools/testing/selftests/net/fib_tests.sh
> @@ -2437,6 +2437,9 @@ ipv4_mpath_list_test()
> run_cmd "ip -n ns2 route add 203.0.113.0/24
> nexthop via 172.16.201.2 nexthop via 172.16.202.2"
> run_cmd "ip netns exec ns2 sysctl -qw net.ipv4.fib_multipath_hash_policy=1"
> + run_cmd "ip netns exec ns2 sysctl -qw net.ipv4.conf.veth2.rp_filter=0"
> + run_cmd "ip netns exec ns2 sysctl -qw net.ipv4.conf.all.rp_filter=0"
> + run_cmd "ip netns exec ns2 sysctl -qw net.ipv4.conf.default.rp_filter=0"
> set +e
>
> local dmac=$(ip -n ns2 -j link show dev veth2 | jq -r '.[]["address"]')
>
> The second problem (which I believe is the one you encountered) is that
> we might miss certain trace point hits if they happen from the ksoftirqd
> task instead of the mausezahn task. Fixed by:
>
> @@ -2449,7 +2452,7 @@ ipv4_mpath_list_test()
> # words, the FIB lookup tracepoint needs to be triggered for every
> # packet.
> local t0_rx_pkts=$(link_stats_get ns2 veth2 rx packets)
> - run_cmd "perf stat -e fib:fib_table_lookup --filter 'err == 0' -j -o $tmp_file -- $cmd"
> + run_cmd "perf stat -a -e fib:fib_table_lookup --filter 'err == 0' -j -o $tmp_file -- $cmd"
> local t1_rx_pkts=$(link_stats_get ns2 veth2 rx packets)
> local diff=$(echo $t1_rx_pkts - $t0_rx_pkts | bc -l)
> list_rcv_eval $tmp_file $diff
> @@ -2494,7 +2497,7 @@ ipv6_mpath_list_test()
> # words, the FIB lookup tracepoint needs to be triggered for every
> # packet.
> local t0_rx_pkts=$(link_stats_get ns2 veth2 rx packets)
> - run_cmd "perf stat -e fib6:fib6_table_lookup --filter 'err == 0' -j -o $tmp_file -- $cmd"
> + run_cmd "perf stat -a -e fib6:fib6_table_lookup --filter 'err == 0' -j -o $tmp_file -- $cmd"
> local t1_rx_pkts=$(link_stats_get ns2 veth2 rx packets)
> local diff=$(echo $t1_rx_pkts - $t0_rx_pkts | bc -l)
> list_rcv_eval $tmp_file $diff
>
> Ran both tests in a loop:
>
> # for i in $(seq 1 20); do ./fib_tests.sh -t ipv4_mpath_list; done
> # for i in $(seq 1 20); do ./fib_tests.sh -t ipv6_mpath_list; done
>
> And verified that the results are stable. Also verified that the tests
> reliably fail when reverting both fixes:
>
> 8423be8926aa ipv6: ignore dst hint for multipath routes
> 6ac66cb03ae3 ipv4: ignore dst hint for multipath routes
>
> Can you please test with the proposed modifications?
we applied above patches upon 8ae9efb859, and two tests passed now:
# IPv4 multipath list receive tests
# TEST: Multipath route hit ratio (.99) [ OK ]
#
# IPv6 multipath list receive tests
# TEST: Multipath route hit ratio (1.00) [ OK ]
#
# Tests passed: 225
# Tests failed: 0
ok 17 selftests: net: fib_tests.sh
Tested-by: kernel test robot <oliver.sang@xxxxxxxxx>
>
> Thanks
>