Re: Kernel regression tracking/reporting initiatives and KCIDB

From: Nikolai Kondrashov
Date: Mon Aug 07 2023 - 04:29:23 EST


Hi Thorsten,

On 8/2/23 11:07, Thorsten Leemhuis wrote:
> On 01.08.23 13:47, Ricardo Cañuelo wrote:
>> So far, we’ve been using the KernelCI regression data and reports as
>> data source, we're now wondering if we could tackle the problem with a
>> more general approach by building on top of what KCIDB already provides.
>
> That's more your area of expertise, but I have to wonder: doesn't that
> mainly depend on what the people/projects want which feed their test
> results into KCIDB? I had expected some of them might already have
> something to stay on top of regressions found by their systems, to at
> least ensure they notice and fix tests that broke for external reasons
> -- e.g. a test script going sideways, faulty hardware, a network
> miss-configuration or others things which naturally will occur in this
> line of work.

Yes, some of this is already done by some CI systems submitting results to
KCIDB. Syzbot is doing a very good job deduplicating crashes they have found,
0day is looking for outcome differences, AFAIK, and CKI has its known-issue
tracking system, which handles problems of various origin.

>> In general, CI systems tend to define regressions as a low-level concept
>> which is rather static: a snapshot of a test result at a certain point
>> in time. When it comes to reporting them to developers, there's much
>> more info that could be added to them.
>
> I wonder if it should be s/could/should/ here, as *if I* would be
> running CI systems I'd fear that developers sooner or later might start
> ignoring more and more of the reports my systems sends when too many of
> them turn out to be bogus/misleading -- which naturally will happen for
> various reasons you outlined below yourself (broken
> hardware/test/network/...) (and seem to happen regularly, as mentioned
> in https://lwn.net/Articles/939538/ ).

Yes, this is a constant struggle.

> That doesn't mean that I think each failed test should be judged by a
> human before it's sent to the developers. Compile errors for example
> will be helpful often right away, especially for stable-rc.

Ehhh, KCIDB gets build failures all the time (in merged code) and it takes
a while before a fix propagates across all the trees.

For example, the recent v6.5-rc5 has got 14 build failures (out of 865 builds
received):

https://kcidb.kernelci.org/d/revision/revision?orgId=1&var-git_commit_hash=52a93d39b17dc7eb98b6aa3edb93943248e03b2f&var-patchset_hash=

I suspect that someone somewhere is already working on these, or even their
fix has been merged somewhere already, but the CI just keeps failing
meanwhile.

>> In particular, the context of it
>> and the fact that a reported regression has a life cycle:
>>
>> - did this test also fail on other hardware targets or with other kernel
>> configurations?
>> - is it possible that the test failed because of an infrastructure
>> error?
>> - does the test fail consistently since that commit or does it show
>> unstable results?
>> - does the test output show any traces of already known bugs?
>> - has this regression been bisected and reported anywhere?
>> - was the regression reported by anyone? If so, is there someone already
>> working on it?
>>
>> Many of these info points can be extracted from the CI results databases
>> and processed to provide additional regression data. That’s what we’re
>> trying to do with the Regression Tracker tool, and we think it’d be
>> interesting to start experimenting with the data in KCIDB to see how
>> this could be improved and what would be the right way to integrate this
>> type of functionality.
>
> I (with my likely somewhat biased view due to regzbot and my work with
> it) wonder if we have two aspects here that might be wise to keep separated:
>
> * tests suddenly failing in one or multiple CI systems, which might be
> due to something going sideways in the tests or a real kernel regression
>
> * regressions found by individuals or CI systems where a human with some
> knowledge about the kernel did a sanity check (and also looked for
> duplicates) to ensure this most likely is a regression that should be
> acted upon -- and thus is also something that definitely should not be
> forgotten.
>
> Your regression tracking tool could be the former, regzbot the latter
> (which could feed the outcome back to the CI regression tracking
> system). But as I said, my view is obviously biased, so maybe I'm to
> blinded to see a better solution.

I agree that a human would be trusted more most of the time, and it would be
beneficial to give the results of human review a boost. However, ultimately,
automatic error detection is also made by humans, and it doesn't get tired,
can detect harder-to-spot problems, and problems happening en-masse, as you
mention.

If we consider applying patterns defined by humans to find already-known
issues in other test results, we get a combination of the two. I think
training an AI on the manually-detected issues, and those picked by those
patterns, could help us find completely new issues, and would further blur the
line between manual and automatic issue detection. Something that I'm looking
forward to exploring.

Regardless, I think we need both, and, in general, every trick in the book to
get Linux quality control on track.

Nick