Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports

From: Dan Rue
Date: Wed Apr 15 2020 - 12:23:18 EST


On Wed, Apr 15, 2020 at 01:09:32PM +0200, Dmitry Vyukov wrote:
> On Tue, Apr 14, 2020 at 9:28 PM Dan Rue <dan.rue@xxxxxxxxxx> wrote:
> >
> > On Tue, Apr 14, 2020 at 01:12:50PM +0200, Dmitry Vyukov wrote:
> > > On Tue, Apr 14, 2020 at 12:06 AM Qian Cai <cai@xxxxxx> wrote:
> > > > Well, there are other CI's beyond syzbot.
> > > > On the other hand, this makes me worry who is testing on linux-next every day.
> > >
> > > How do these use-after-free's and locking bugs get past the
> > > unit-testing systems (which syzbot is not) and remain unnoticed for so
> > > long?...
> > > syzbot uses the dumbest VMs (GCE), so everything it triggers during
> > > boot should be triggerable pretty much everywhere.
> > > It seems to be an action point for the testing systems. "Boot to ssh"
> > > is not the best criteria. Again if there is a LOCKDEP error, we are
> > > not catching any more LOCKDEP errors during subsequent testing. If
> > > there is a use-after-free, that's a serious error on its own and KASAN
> > > produces only 1 error by default as well. And as far as I understand,
> > > lots of kernel testing systems don't even enable KASAN, which is very
> > > wrong.
> > > I've talked to +Dan Rue re this few days ago. Hopefully LKFT will
> > > start catching these as part of unit testing. Which should help with
> > > syzbot testing as well.
> >
> > LKFT has recently added testing with KASAN enabled and improved the
> > kernel log parsing to catch more of this class of errors while
> > performing our regular functional testing.
> >
> > Incidentally, -next was also broken for us from March 25 through April 5
> > due to a perf build failure[0], which eventually made itself all the way
> > down into v5.6 release and I believe the first two 5.6.x stable
> > releases.
> >
> > For -next, LKFT's gap is primarily reporting. We do build and run over
> > 30k tests on every -next daily release, but we send out issues manually
> > when we see them because triaging is still a manual effort. We're
> > working to build better automated reporting. If anyone is interested in
> > watching LKFT's -next results more closely (warning, it's a bit noisy),
> > please let me know. Watching the results at https://lkft.linaro.org
> > provides some overall health indications, but again, it gets pretty
> > difficult to figure out signal from noise once you start drilling down
> > without sufficient context of the system.
>
> What kind of failures and noise do you get? Is it flaky tests?
> I would assume build failures are ~0% flaky/noisy. And boot failures
> are maybe ~1% flaky/noisy due to some infra issues.

Right - infrastructure problems aside (which are the easy part), tests
are quite flaky/noisy.

I guess we're getting quite off topic now, but in LKFT's case we run
tests that are available from the likes of LTP, kselftest, and a variety
of other test suites. Every test was written by a developer with certain
assumptions in place - many of which we violate when we run them on a
small arm board, for example. And many may just be low quality to begin
with, but they often work well enough for the original author's
use-case.

In such cases, we mark them (manually at this point) as a known issue.
For example, here are our kselftest known issues:
https://github.com/Linaro/qa-reports-known-issues/blob/master/kselftests-production.yaml

These lists are quite a chore to keep up to date, and so they tend to
lag reality. What's needed (and what we're working toward) is more
sophisticated analytics on top of our results to determine actual
regressions.

I'll give just one example, randomly selected but typical. Here's a
timer test that sometimes passes and sometimes fails, which compares how
much time something takes with a hard coded value of what the author
expects. Running on small arm hosts or under qemu, the following check
sometimes fails:
https://github.com/torvalds/linux/blob/master/tools/testing/selftests/timers/rtcpie.c#L104-L111

There are _many_ such tests - hundreds or thousands, which rely on hard
coded expectations and are quite hard to "fix". But we run them all
because most of them haven't failed yet, and if they do we'll find out
why.

We ignore the tests which either always fail, or which sometimes fail,
in general. I'm sure there are some legitimate bugs in that set of
failures, but they're probably not "regressions" so just as syzkaller
lets old bugs close automatically, we ignore tests that have a history
of failing.

>
> I can't find any actual test failure logs in the UI. I've got to this page:
> https://qa-reports.linaro.org/lkft/linux-mainline-oe/build/v5.7-rc1-24-g8632e9b5645b/testrun/1363280/suite/kselftest/tests/
> which seem to contain failed tests on mainline. But I still can't find
> the actual test failure logs.

>From the link you gave, if you go up one level to
https://qa-reports.linaro.org/lkft/linux-mainline-oe/build/v5.7-rc1-24-g8632e9b5645b/testrun/1363280/,
you will see links to the "Log File" which takes you to
https://qa-reports.linaro.org/lkft/linux-mainline-oe/build/v5.7-rc1-24-g8632e9b5645b/testrun/1363280/log.

In some test suite cases (perhaps just LTP), we have logs per test. In
most, we just have one large log of the entire run. Even when we have a
log-per-test, it may miss some asynchronous dmesg output I expect,
causing an investigator to look at the whole log anyway.

Dan

>
>
> > Dan
> >
> > [0] https://lore.kernel.org/stable/CA+G9fYsZjmf34pQT1DeLN_DDwvxCWEkbzBfF0q2VERHb25dfZQ@xxxxxxxxxxxxxx/
> >
> > --
> > Linaro LKFT
> > https://lkft.linaro.org

--
Linaro LKFT
https://lkft.linaro.org