Re: [Resend] Question: kselftests: bpf/test_maps failed
From: Alexei Starovoitov
Date: Fri Feb 09 2018 - 20:36:23 EST
On Fri, Feb 09, 2018 at 03:01:57PM +0100, Daniel Borkmann wrote:
> On 02/09/2018 06:14 AM, Li Zhijian wrote:
> > Hi
> >
> > INTEL 0-Day noticed that bpf/test_maps has different results at different platforms.
> > when it fails, the details are like
>
> Sorry for the late reply and thanks for reporting! More below:
>
> > ------------------
> > 880 Failed to create hashmap key=16 value=131072 'Cannot allocate memory'
> > 881 Failed to create hashmap key=8 value=32768 'Cannot allocate memory'
> > 882 Failed to create hashmap key=8 value=131072 'Cannot allocate memory'
> > 883 Failed to create hashmap key=16 value=32768 'Cannot allocate memory'
> > 884 Failed to create hashmap key=8 value=16384 'Cannot allocate memory'
> > 885 Failed to create hashmap key=16 value=16384 'Cannot allocate memory'
> > 886 Failed to create hashmap key=8 value=65536 'Cannot allocate memory'
> > 887 Failed to create hashmap key=16 value=131072 'Cannot allocate memory'
> > 888 Failed to create hashmap key=16 value=32768 'Cannot allocate memory'
> > 889 Failed to create hashmap key=16 value=65536 'Cannot allocate memory'
> > 890 Failed to create hashmap key=8 value=65536 'Cannot allocate memory'
> > 891 Failed to create hashmap key=8 value=131072 'Cannot allocate memory'
> > 892 Failed to create hashmap key=8 value=131072 'Cannot allocate memory'
> > 893 Failed to create hashmap key=16 value=32768 'Cannot allocate memory'
> > 894 Failed to create hashmap key=8 value=16384 'Cannot allocate memory'
> > 895 Failed to create hashmap key=8 value=131072 'Cannot allocate memory'
> > 896 Failed to create hashmap key=16 value=8192 'Cannot allocate memory'
> > 897 Failed to create hashmap key=8 value=32768 'Cannot allocate memory'
> > 898 Failed to create hashmap key=16 value=8192 'Cannot allocate memory'
> > 899 Failed to create hashmap key=8 value=262144 'Cannot allocate memory'
> > 900 Failed to create hashmap key=8 value=262144 'Cannot allocate memory'
> > 901 Failed to create hashmap key=8 value=262144 'Cannot allocate memory'
> > 902 Failed to create hashmap key=16 value=262144 'Cannot allocate memory'
> > 903 Failed to create hashmap key=8 value=262144 'Cannot allocate memory'
> > 904 Failed to create hashmap key=8 value=262144 'Cannot allocate memory'
> > 905 test_maps: test_maps.c:955: run_parallel: Assertion `status == 0' failed.
> > 906 Aborted
> > 907 not ok 1..3 selftests: test_maps [FAIL]
> > ------------------
> >
> > After a simply looking at the code, looks it's related to the cpu number and system memory.
> >
> > below are the result under different platform
> > 1. Good
> > model: Sandy Bridge
> > nr_node: 1
> > nr_cpu: 4
> > memory: 6G
> >
> > 2. Good
> > model: qemu-system-x86_64 -enable-kvm
> > nr_cpu: 2
> > memory: 4G
> >
> > 3. Bad
> > model: Ivytown Ivy Bridge-EP
> > nr_cpu: 48
> > memory: 64G
> >
> > 4. Bad
> > model: Skylake
> > nr_cpu: 104
> > memory: 64G
> >
> > I try to change the process number to 10 from 100, so it can pass at above Skylake(4) machine.
> > ------------
> > lizhijian@haswell-OptiPlex-9020:~/lkp/linux/tools/testing/selftests/bpf$ git diff
> > diff --git a/tools/testing/selftests/bpf/test_maps.c b/tools/testing/selftests/bpf/test_maps.c
> > index 040356e..b788ca1 100644
> > --- a/tools/testing/selftests/bpf/test_maps.c
> > +++ b/tools/testing/selftests/bpf/test_maps.c
> > @@ -960,7 +960,7 @@ static void test_map_stress(void)
> > {
> > run_parallel(100, test_hashmap, NULL);
> > run_parallel(100, test_hashmap_percpu, NULL);
> > - run_parallel(100, test_hashmap_sizes, NULL);
> > + run_parallel(10, test_hashmap_sizes, NULL);
> > run_parallel(100, test_hashmap_walk, NULL);
> >
> > run_parallel(100, test_arraymap, NULL);
>
> Unless Alexei has some better idea, I think if the bpf_create_map() error in
> the stress test is about ENOMEM, then we shouldn't fail hard via exit(), for
> all other cases we should however. So probably makes sense to just check for
> errno == ENOMEM in case of fd < 0 in test_hashmap_sizes() and then continue
> to keep trying under stress. Feel free to send a patch, Li.
that's probably good path for now.
I also see that test_maps fails on freshly booted kernel with such assert,
but then restarting test_maps again works and repeated runs succeed too.
I suspect there is a deeper issue here related to memory allocation.
Either slab or percpu allocator are behaving funky.
It needs to be further debugged.