Towards 4.14 LTS

From: Tom Gall
Date: Thu Nov 16 2017 - 23:50:38 EST


At Linaro weâve been putting effort into regularly running kernel tests over
arm, arm64 and x86_64 targets. On those targets weâre running mainline, -next,
4.4, and 4.9 kernels and yes we are adding to this list as the hardware
capacity grows.

For test buckets weâre using just LTP, kselftest and libhugetlbfs and
like kernels we will add to this list.

With the 4.14 cycle being a little âdifferentâ in so much as the goal to
have it be an LTS kernel I think itâs important to take a look at some
4.14 test results.

Grab a beverage, this is a bit of a long post. But quick summery 4.14 as
released looks just as good as 4.13, for the test buckets I named above.

Iâve enclosed our short form report. We break down the boards/arch combos for
each bucket pass/skip or potentially fails. Pretty straight forward. Skips
generally happen for a few reasons
1) crappy test cases
2) test isnât appropriate (x86 specific tests so donât run elsewhere)

With this, we have a decent baseline for 4.14 and other kernels going
forward.

Summary
------------------------------------------------------------------------

kernel: 4.14.0
git repo: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
git branch: master
git commit: bebc6082da0a9f5d47a1ea2edc099bf671058bd4
git describe: v4.14
Test details: https://qa-reports.linaro.org/lkft/linux-mainline-oe/build/v4.14


No regressions (compared to build v4.14-rc8)

Boards, architectures and test suites:
-------------------------------------

hi6220-hikey - arm64
* boot - pass: 20
* kselftest - skip: 16, pass: 38
* libhugetlbfs - skip: 1, pass: 90
* ltp-cap_bounds-tests - pass: 2
* ltp-containers-tests - pass: 76
* ltp-fcntl-locktests-tests - pass: 2
* ltp-filecaps-tests - pass: 2
* ltp-fs-tests - pass: 60
* ltp-fs_bind-tests - pass: 2
* ltp-fs_perms_simple-tests - pass: 19
* ltp-fsx-tests - pass: 2
* ltp-hugetlb-tests - skip: 1, pass: 21
* ltp-io-tests - pass: 3
* ltp-ipc-tests - pass: 9
* ltp-math-tests - pass: 11
* ltp-nptl-tests - pass: 2
* ltp-pty-tests - pass: 4
* ltp-sched-tests - pass: 14
* ltp-securebits-tests - pass: 4
* ltp-syscalls-tests - skip: 122, pass: 983
* ltp-timers-tests - pass: 12

juno-r2 - arm64
* boot - pass: 20
* kselftest - skip: 15, pass: 38
* libhugetlbfs - skip: 1, pass: 90
* ltp-cap_bounds-tests - pass: 2
* ltp-containers-tests - pass: 76
* ltp-fcntl-locktests-tests - pass: 2
* ltp-filecaps-tests - pass: 2
* ltp-fs-tests - pass: 60
* ltp-fs_bind-tests - pass: 2
* ltp-fs_perms_simple-tests - pass: 19
* ltp-fsx-tests - pass: 2
* ltp-hugetlb-tests - pass: 22
* ltp-io-tests - pass: 3
* ltp-ipc-tests - pass: 9
* ltp-math-tests - pass: 11
* ltp-nptl-tests - pass: 2
* ltp-pty-tests - pass: 4
* ltp-sched-tests - pass: 10
* ltp-securebits-tests - pass: 4
* ltp-syscalls-tests - skip: 156, pass: 943
* ltp-timers-tests - pass: 12

x15 - arm
* boot - pass: 20
* kselftest - skip: 17, pass: 36
* libhugetlbfs - skip: 1, pass: 87
* ltp-cap_bounds-tests - pass: 2
* ltp-containers-tests - pass: 64
* ltp-fcntl-locktests-tests - pass: 2
* ltp-filecaps-tests - pass: 2
* ltp-fs-tests - pass: 60
* ltp-fs_bind-tests - pass: 2
* ltp-fs_perms_simple-tests - pass: 19
* ltp-fsx-tests - pass: 2
* ltp-hugetlb-tests - skip: 2, pass: 20
* ltp-io-tests - pass: 3
* ltp-ipc-tests - pass: 9
* ltp-math-tests - pass: 11
* ltp-nptl-tests - pass: 2
* ltp-pty-tests - pass: 4
* ltp-sched-tests - skip: 1, pass: 13
* ltp-securebits-tests - pass: 4
* ltp-syscalls-tests - skip: 66, pass: 1040
* ltp-timers-tests - pass: 12

dell-poweredge-r200 - x86_64
* boot - pass: 19
* kselftest - skip: 11, pass: 54
* libhugetlbfs - skip: 1, pass: 76
* ltp-cap_bounds-tests - pass: 1
* ltp-containers-tests - pass: 64
* ltp-fcntl-locktests-tests - pass: 2
* ltp-filecaps-tests - pass: 2
* ltp-fs-tests - skip: 1, pass: 61
* ltp-fs_bind-tests - pass: 1
* ltp-fs_perms_simple-tests - pass: 19
* ltp-fsx-tests - pass: 2
* ltp-hugetlb-tests - pass: 22
* ltp-io-tests - pass: 3
* ltp-ipc-tests - pass: 8
* ltp-math-tests - pass: 11
* ltp-nptl-tests - pass: 2
* ltp-pty-tests - pass: 4
* ltp-sched-tests - pass: 9
* ltp-securebits-tests - pass: 3
* ltp-syscalls-tests - skip: 163, pass: 962

Lots of green.


Letâs now talk about coverage, the pandoraâs box of validation. Itâs never
perfect. Thereâs a bazillion different build combos. Even tools can
make a difference. Weâve seen a case where the dhcp client from open embedded
didnât trigger a network regression in one of the LTS RCs but Debianâs dhclient
did.

Of no surprise between what we and others have, itâs not perfect coverage,
and there are only so many build, boot and run cycles to execute the test
buckets with various combinations so we need to stay sensible as far as
kernel configs go.

Does this kind of system actually FIND anything and is it useful for
watching for 4.14 regressions as fixes are introduced?

I would assert the answer is yes. We do have data for a couple of kernel
cycles but itâs also somewhat dirty as we have been in the process of
detecting and tossing out dodgy test cases.

Take 4.14-RC7, there was one failure that is no longer there.
ltp-syscalls-tests : perf_event_open02 (arm64)

As things are getting merged post 4.14 there are some failures
cropping up. Hereâs an example:
https://qa-reports.linaro.org/lkft/linux-mainline-oe/tests/ltp-fs-tests/proc01

Note the Build column, the kernels are identified by their git describe.
Donât be alarmed if you see n/a in some columns, the queues are catching up
so data will be filling in.


So why didnât we report these? As mentioned weâve been tossing out dodgy
test cases to get to a clean baseline. We donât need or want noise.

For LTS, I want the system when it detects a failure to enable a quick
bisect involving the affected test bucket. Given the nature of kernel
bugs tho, there is that class of bug which only happens occasionally.

This brings up a conundrum when you have a system like this. A failure
turns up, itâs not consistently failing and a path forward isnât
necessarily obvious. Remember for an LTS RC, thereâs a defined window
to comment.

Iâve been flamed for reporting a LTS RC test failure which didn't include
a fix, just a âthis fails, and weâre looking at it.â Iâve been flamed
for not reporting a failure that had been detected but not raised to the
list since it was still being debugged after the RC comment window had
closed.

My 1990s vintage asbestos underwear thankfully is functional.

There is probably a case to be made either way. It boils down to
either:

Red Pill) Be fully open reporting early and often
Blue Pill) Be closed and only pass up failures that include a patch to fix a bug.

Red Pill does expose drama yet it also creates an opportunity for others to
get involved.

Blue Pill protects the community from noise and the creation of frustration
that the system has cried wolf for perhaps a stupid test case.

Likewise from a maintainer or dev perspective, thereâs a sea of data.
Time is precious, and who wants to waste it on some snipe hunt?

Iâm personally in the Red Pill camp. I like being open.

Be it 0day, LKFT or whatever I think the responsibility is on us
running these projects to be open and give full guidance. Yes there
will be noise. Noise can suggest dodgy test cases or bugs that are
hard to trigger. Either way they warrant a look. Take Arnd Bergmanâs
work to get rid of kernel warnings. Same concept in my opinion.

Dodgy test cases can easily be put onto skip lists. As weâve been
running for a number of months now, data and ol fashioned code
review has been our guide to banish dodgy test cases to skip lists.
Going forward new test cases will pop up. Some of them will be dodgy.

Thereâs lots of room for collaboration in improving test cases.

In summary I think for mainline, LTS kernels etc, we have a good
warning system to detect regressions as patches flow in. It will evolve
and improve as is the nature of our open community. From kernelci,
LKFT, 0day, etc, thatâs a good set of automated systems to ferret out
problems introduced by patches.

Tom