Re: [bug/regression] libhugetlbfs testsuite failures and OOMs eventually kill my system

From: Jan Stancek
Date: Fri Oct 14 2016 - 04:49:00 EST


On 10/14/2016 01:26 AM, Mike Kravetz wrote:
>
> Hi Jan,
>
> Any chance you can get the contents of /sys/kernel/mm/hugepages
> before and after the first run of libhugetlbfs testsuite on Power?
> Perhaps a script like:
>
> cd /sys/kernel/mm/hugepages
> for f in hugepages-*/*; do
> n=`cat $f`;
> echo -e "$n\t$f";
> done
>
> Just want to make sure the numbers look as they should.
>

Hi Mike,

Numbers are below. I have also isolated a single testcase from "func"
group of tests: corrupt-by-cow-opt [1]. This test stops working if I
run it 19 times (with 20 hugepages). And if I disable this test,
"func" group tests can all pass repeatedly.

[1] https://github.com/libhugetlbfs/libhugetlbfs/blob/master/tests/corrupt-by-cow-opt.c

Regards,
Jan

Kernel is v4.8-14230-gb67be92, with reboot between each run.
1) Only func tests
System boot
After setup:
20 hugepages-16384kB/free_hugepages
20 hugepages-16384kB/nr_hugepages
20 hugepages-16384kB/nr_hugepages_mempolicy
0 hugepages-16384kB/nr_overcommit_hugepages
0 hugepages-16384kB/resv_hugepages
0 hugepages-16384kB/surplus_hugepages
0 hugepages-16777216kB/free_hugepages
0 hugepages-16777216kB/nr_hugepages
0 hugepages-16777216kB/nr_hugepages_mempolicy
0 hugepages-16777216kB/nr_overcommit_hugepages
0 hugepages-16777216kB/resv_hugepages
0 hugepages-16777216kB/surplus_hugepages

After func tests:
********** TEST SUMMARY
* 16M
* 32-bit 64-bit
* Total testcases: 0 85
* Skipped: 0 0
* PASS: 0 81
* FAIL: 0 4
* Killed by signal: 0 0
* Bad configuration: 0 0
* Expected FAIL: 0 0
* Unexpected PASS: 0 0
* Strange test result: 0 0

26 hugepages-16384kB/free_hugepages
26 hugepages-16384kB/nr_hugepages
26 hugepages-16384kB/nr_hugepages_mempolicy
0 hugepages-16384kB/nr_overcommit_hugepages
1 hugepages-16384kB/resv_hugepages
0 hugepages-16384kB/surplus_hugepages
0 hugepages-16777216kB/free_hugepages
0 hugepages-16777216kB/nr_hugepages
0 hugepages-16777216kB/nr_hugepages_mempolicy
0 hugepages-16777216kB/nr_overcommit_hugepages
0 hugepages-16777216kB/resv_hugepages
0 hugepages-16777216kB/surplus_hugepages

After test cleanup:
umount -a -t hugetlbfs
hugeadm --pool-pages-max ${HPSIZE}:0

1 hugepages-16384kB/free_hugepages
1 hugepages-16384kB/nr_hugepages
1 hugepages-16384kB/nr_hugepages_mempolicy
0 hugepages-16384kB/nr_overcommit_hugepages
1 hugepages-16384kB/resv_hugepages
1 hugepages-16384kB/surplus_hugepages
0 hugepages-16777216kB/free_hugepages
0 hugepages-16777216kB/nr_hugepages
0 hugepages-16777216kB/nr_hugepages_mempolicy
0 hugepages-16777216kB/nr_overcommit_hugepages
0 hugepages-16777216kB/resv_hugepages
0 hugepages-16777216kB/surplus_hugepages

---

2) Only stress tests
System boot
After setup:
20 hugepages-16384kB/free_hugepages
20 hugepages-16384kB/nr_hugepages
20 hugepages-16384kB/nr_hugepages_mempolicy
0 hugepages-16384kB/nr_overcommit_hugepages
0 hugepages-16384kB/resv_hugepages
0 hugepages-16384kB/surplus_hugepages
0 hugepages-16777216kB/free_hugepages
0 hugepages-16777216kB/nr_hugepages
0 hugepages-16777216kB/nr_hugepages_mempolicy
0 hugepages-16777216kB/nr_overcommit_hugepages
0 hugepages-16777216kB/resv_hugepages
0 hugepages-16777216kB/surplus_hugepages

After stress tests:
20 hugepages-16384kB/free_hugepages
20 hugepages-16384kB/nr_hugepages
20 hugepages-16384kB/nr_hugepages_mempolicy
0 hugepages-16384kB/nr_overcommit_hugepages
17 hugepages-16384kB/resv_hugepages
0 hugepages-16384kB/surplus_hugepages
0 hugepages-16777216kB/free_hugepages
0 hugepages-16777216kB/nr_hugepages
0 hugepages-16777216kB/nr_hugepages_mempolicy
0 hugepages-16777216kB/nr_overcommit_hugepages
0 hugepages-16777216kB/resv_hugepages
0 hugepages-16777216kB/surplus_hugepages

After cleanup:
17 hugepages-16384kB/free_hugepages
17 hugepages-16384kB/nr_hugepages
17 hugepages-16384kB/nr_hugepages_mempolicy
0 hugepages-16384kB/nr_overcommit_hugepages
17 hugepages-16384kB/resv_hugepages
17 hugepages-16384kB/surplus_hugepages
0 hugepages-16777216kB/free_hugepages
0 hugepages-16777216kB/nr_hugepages
0 hugepages-16777216kB/nr_hugepages_mempolicy
0 hugepages-16777216kB/nr_overcommit_hugepages
0 hugepages-16777216kB/resv_hugepages
0 hugepages-16777216kB/surplus_hugepages

---

3) only corrupt-by-cow-opt

System boot
After setup:
20 hugepages-16384kB/free_hugepages
20 hugepages-16384kB/nr_hugepages
20 hugepages-16384kB/nr_hugepages_mempolicy
0 hugepages-16384kB/nr_overcommit_hugepages
0 hugepages-16384kB/resv_hugepages
0 hugepages-16384kB/surplus_hugepages
0 hugepages-16777216kB/free_hugepages
0 hugepages-16777216kB/nr_hugepages
0 hugepages-16777216kB/nr_hugepages_mempolicy
0 hugepages-16777216kB/nr_overcommit_hugepages
0 hugepages-16777216kB/resv_hugepages
0 hugepages-16777216kB/surplus_hugepages

libhugetlbfs-2.18# env LD_LIBRARY_PATH=./obj64 ./tests/obj64/corrupt-by-cow-opt; /root/grab.sh
Starting testcase "./tests/obj64/corrupt-by-cow-opt", pid 3298
Write s to 0x3effff000000 via shared mapping
Write p to 0x3effff000000 via private mapping
Read s from 0x3effff000000 via shared mapping
PASS
20 hugepages-16384kB/free_hugepages
20 hugepages-16384kB/nr_hugepages
20 hugepages-16384kB/nr_hugepages_mempolicy
0 hugepages-16384kB/nr_overcommit_hugepages
1 hugepages-16384kB/resv_hugepages
0 hugepages-16384kB/surplus_hugepages
0 hugepages-16777216kB/free_hugepages
0 hugepages-16777216kB/nr_hugepages
0 hugepages-16777216kB/nr_hugepages_mempolicy
0 hugepages-16777216kB/nr_overcommit_hugepages
0 hugepages-16777216kB/resv_hugepages
0 hugepages-16777216kB/surplus_hugepages

# env LD_LIBRARY_PATH=./obj64 ./tests/obj64/corrupt-by-cow-opt; /root/grab.sh
Starting testcase "./tests/obj64/corrupt-by-cow-opt", pid 3312
Write s to 0x3effff000000 via shared mapping
Write p to 0x3effff000000 via private mapping
Read s from 0x3effff000000 via shared mapping
PASS
20 hugepages-16384kB/free_hugepages
20 hugepages-16384kB/nr_hugepages
20 hugepages-16384kB/nr_hugepages_mempolicy
0 hugepages-16384kB/nr_overcommit_hugepages
2 hugepages-16384kB/resv_hugepages
0 hugepages-16384kB/surplus_hugepages
0 hugepages-16777216kB/free_hugepages
0 hugepages-16777216kB/nr_hugepages
0 hugepages-16777216kB/nr_hugepages_mempolicy
0 hugepages-16777216kB/nr_overcommit_hugepages
0 hugepages-16777216kB/resv_hugepages
0 hugepages-16777216kB/surplus_hugepages

(... output cut from ~17 iterations ...)

# env LD_LIBRARY_PATH=./obj64 ./tests/obj64/corrupt-by-cow-opt; /root/grab.sh
Starting testcase "./tests/obj64/corrupt-by-cow-opt", pid 3686
Write s to 0x3effff000000 via shared mapping
Bus error
20 hugepages-16384kB/free_hugepages
20 hugepages-16384kB/nr_hugepages
20 hugepages-16384kB/nr_hugepages_mempolicy
0 hugepages-16384kB/nr_overcommit_hugepages
19 hugepages-16384kB/resv_hugepages
0 hugepages-16384kB/surplus_hugepages
0 hugepages-16777216kB/free_hugepages
0 hugepages-16777216kB/nr_hugepages
0 hugepages-16777216kB/nr_hugepages_mempolicy
0 hugepages-16777216kB/nr_overcommit_hugepages
0 hugepages-16777216kB/resv_hugepages
0 hugepages-16777216kB/surplus_hugepages

# env LD_LIBRARY_PATH=./obj64 ./tests/obj64/corrupt-by-cow-opt; /root/grab.sh
Starting testcase "./tests/obj64/corrupt-by-cow-opt", pid 3700
Write s to 0x3effff000000 via shared mapping
FAIL mmap() 2: Cannot allocate memory
20 hugepages-16384kB/free_hugepages
20 hugepages-16384kB/nr_hugepages
20 hugepages-16384kB/nr_hugepages_mempolicy
0 hugepages-16384kB/nr_overcommit_hugepages
19 hugepages-16384kB/resv_hugepages
0 hugepages-16384kB/surplus_hugepages
0 hugepages-16777216kB/free_hugepages
0 hugepages-16777216kB/nr_hugepages
0 hugepages-16777216kB/nr_hugepages_mempolicy
0 hugepages-16777216kB/nr_overcommit_hugepages
0 hugepages-16777216kB/resv_hugepages
0 hugepages-16777216kB/surplus_hugepages