Re: Kernel regression tracking/reporting initiatives and KCIDB
From: Nikolai Kondrashov
Date: Fri Aug 04 2023 - 12:06:45 EST
Hi Ricardo,
On 8/1/23 14:47, Ricardo Cañuelo wrote:
> Hi all,
>
> I'm Ricardo from Collabora. In the past months, we’ve been analyzing the
> current status of CI regression reporting and tracking in the Linux
> kernel: assessing the existing tools, testing their functionalities,
> collecting ideas about desirable features that aren’t available yet and
> sketching some of them.
>
> As part of this effort, we wrote a Regression Tracker tool [1] as a
> proof of concept. It’s a rather simple tool that takes existing
> regression data and reports and uses them to show more context on each
> reported regression, as well as highlighting the relationships between
> them, whether they can be caused by an infrastructure error and other
> additional metadata about their current status. We’ve been using it
> mostly as a playground for us to explore the current status of the
> functionalities provided by CI systems and to test ideas about new
> features.
>
> We’re also checking other tools and services provided by the community,
> such as regzbot [2], collaborating with them when possible and thinking
> about how to combine multiple scattered efforts by different people
> towards the same common goal. As a first step, we’ve contributed to
> regzbot and partially integrated its results into the Regression Tracker
> tool.
Nicely done!
Especially cooperating with regzbot, something I haven't seen so far.
Various other kernel CI systems have been building similar things for a while,
and it's nice to see the trend growing. It means we're getting somewhere.
I tried to review these efforts last year:
https://archive.fosdem.org/2022/schedule/event/masking_known_issues_across_six_kernel_ci_systems/
> So far, we’ve been using the KernelCI regression data and reports as a
> data source, we're now wondering if we could tackle the problem with a
> more general approach by building on top of what KCIDB already provides.
Yes, I would love to work with you on a KCIDB implementation/integration.
I've been exploring and implementing a solution for tracking regressions (or
"known issues" as I usually call them) based on what I researched (and
presented above).
At this moment KCIDB submitters can send data linking a particular test or
build result to an issue and its category (kernel/test/framework). We can
generate notifications on e.g. a new issue being found by CI system
(maintainers) in a particular repo/branch. There's no support on dashboards
yet, and I'm yet to push for integration with particular CI systems.
Here's the full announcement with examples:
https://lore.kernel.org/kernelci/182b43fa-0261-11b7-2edb-f379a669bc28@xxxxxxxxxx/
Admittedly it's not much, and you're much further along (as many other CI
systems), but we have further plans (more on that below).
> In general, CI systems tend to define regressions as a low-level concept
> which is rather static: a snapshot of a test result at a certain point
> in time. When it comes to reporting them to developers, there's much
> more info that could be added to them. In particular, the context of it
> and the fact that a reported regression has a life cycle:
>
> - did this test also fail on other hardware targets or with other kernel
> configurations?
> - is it possible that the test failed because of an infrastructure
> error?
> - does the test fail consistently since that commit or does it show
> unstable results?
> - does the test output show any traces of already known bugs?
> - has this regression been bisected and reported anywhere?
> - was the regression reported by anyone? If so, is there someone already
> working on it?
>
> Many of these info points can be extracted from the CI results databases
> and processed to provide additional regression data. That’s what we’re
> trying to do with the Regression Tracker tool, and we think it’d be
> interesting to start experimenting with the data in KCIDB to see how
> this could be improved and what would be the right way to integrate this
> type of functionality.
These all are very useful insights to extract from the data, nicely done!
Here's how they map to KCIDB:
> - did this test also fail on other hardware targets or with other kernel
> configurations?
KCIDB doesn't have a schema for identifying hardware at this moment. We can
work on that, but meanwhile KCIDB dashboards wouldn't be able to show this.
> - is it possible that the test failed because of an infrastructure
> error?
Not sure how to approach this in KCIDB. How do you (plan to) do it?
> - does the test fail consistently since that commit or does it show
> unstable results?
This is a difficult thing to properly figure out in KCIDB, because it
aggregates data from multiple CI systems. A single CI system can assume that
earlier results for a branch correspond to earlier commits. However, because
different CI systems test at different speeds, KCIDB cannot make that
assumption. Commits and their testing results can come in any order. So we
cannot draw these kinds of conclusions based on time alone.
The only way KCIDB can make this work correctly is by correlating with actual
git history, following the commit graph. I did some research into graph
databases and while they can potentially help us do it, their performance with
the actual Linux kernel git history turned out to be abysmal, due to a large
number of nodes and edges, and the lack of optimization for DAGs:
https://fosdem.org/2023/schedule/event/graph_case_for_dag/
I got an optimistic promise from Neo4j folks to have this possibly working by
next FOSDEM, but I wouldn't hold my breath for that. The fallback plan is to
hack something together using libgit2 and/or the git command-tools.
Before that happens, I think we can still do other things, just with time, to
help us along.
E.g. on the dashboard of a particular test result, display graphs of this
test's results over time: overall, and for
architecture/compiler/config/repo/branch of this test run. And something
similar for test views on revision/checkout/build dashboards.
BTW, a couple Mercurial folks approached me after the talk above, saying that
they're working on supporting storing test results in history so they could do
a similar kind of correlation and reasoning. So the idea is in the air.
> - does the test output show any traces of already known bugs?
> - was the regression reported by anyone? If so, is there someone already
> working on it?
This is what the KCIDB issue-linking support described above is working
towards. Next step is to build a triaging system linking issues to build/test
results automatically, based on patterns submitted by both CI systems, via the
regular submission interface, and humans, via a dedicated UI.
Patterns would specify which issue (bug URL) they're matching and include
basic things like test name, architecture, hardware, and so on, but also
patterns to find in e.g. test output files, logs, or in dmesg.
That should answer questions of whether a test or a build exhibit a particular
issue seen before.
> - has this regression been bisected and reported anywhere?
Once we have history correlation mentioned above, then we would be able to
find the PASS/FAIL boundaries between commits for particular issues, already
based on just issue-linking reported by CI systems (even before implementing
triaging).
This would be a way to detect bisections, among other things. I.e. detecting
if two adjacent commits both have results of a particular test, and they are
different. This would, of course, also detect cases when the results just
happened to appear in adjacent commits, not only because of bisection.
I think this could be done more generally via frequency domain analyzis (FFT)
of test outcomes over git history, which would also detect cases of flaky test
changing failure frequency. But here I'm getting waaay ahead of myself :D
Anyway, these are my ideas for KCIDB. I would love to hear your ideas as well
feedback on the above. Email, IRC, Slack, or a video call would all do :D
--
One comment regarding the prototype you shared is that it's quite verbose and
it's difficult to put together a feeling of what's been happening from
overabundance of textual information. I think a visual touch could help here.
E.g. drawing a timeline of test results, pointing particular events (first
failed, first passed, stability and so on) along its length.
So instead of this:
> first failed: today (2023-08-02)
>
> kernel: chromeos-stable-20230802.0
> commit: 5c04267bed569d41aea3940402c7ce8cf975a5fe
>
> most recent fail: today (2023-08-02)
>
> kernel: chromeos-stable-20230802.0
> commit: 5c04267bed569d41aea3940402c7ce8cf975a5fe
>
> last passed: 1 day ago (2023-08-01)
>
> kernel: chromeos-stable-20230801.1
> commit: cd496545d91d820441277cd6a855b9af725fdb8a
Something like this (roughly):
|
2023-08-02 F - last FAIL
F
|
P
F
|
2023-08-02 F - first FAIL
|
2023-08-01 P - last PASS
|
P
And e.g. have the commit and other extra info pop up as needed when hovering
over the status (F/P) letters/icons
And in general try to express information more visually, so it could be
absorbed at a glance, without needing to read much text, and tuck away
information that's not immediately necessary into more on-hover popups.
---
Hope this helps, and thanks for reading through :D
Nick