Re: RFC - kernel selftest result documentation (KTAP)

From: David Gow
Date: Mon Jun 22 2020 - 22:58:51 EST


On Sat, Jun 20, 2020 at 11:03 PM Frank Rowand <frowand.list@xxxxxxxxx> wrote:
>
> On 2020-06-20 01:44, David Gow wrote:
> > On Sat, Jun 20, 2020 at 1:58 AM Frank Rowand <frowand.list@xxxxxxxxx> wrote:
> >>
> >> On 2020-06-16 07:08, Paolo Bonzini wrote:
> >>> On 15/06/20 21:07, Bird, Tim wrote:
> >
> >>>>>> Finally,
> >>>>>> - Should a SKIP result be 'ok' (TAP13 spec) or 'not ok' (current kselftest practice)?
> >>>>>> See https://testanything.org/tap-version-13-specification.html
> >>>>>
> >>>>> Oh! I totally missed this. Uhm. I think "not ok" makes sense to me "it
> >>>>> did not run successfully". ... but ... Uhhh ... how do XFAIL and SKIP
> >>>>> relate? Neither SKIP nor XFAIL count toward failure, though, so both
> >>>>> should be "ok"? I guess we should change it to "ok".
> >>>
> >>> See above for XFAIL.
> >>>
> >>> I initially raised the issue with "SKIP" because I have a lot of tests
> >>> that depend on hardware availability---for example, a test that does not
> >>> run on some processor kinds (e.g. on AMD, or old Intel)---and for those
> >>> SKIP should be considered a success.
> >>
> >> No, SKIP should not be considered a success. It should also not be considered
> >> a failure. Please do not blur the lines between success, failure, and
> >> skipped.
> >
>
>
> > I agree that skipped tests should be their own thing, separate from
> > success and failure, but the way they tend to behave tends to be
> > closer to a success than a failure.
> >
> > I guess the important note here is that a suite of tests, some of
> > which are SKIPped, can be listed as having passed, so long as none of
> > them failed. So, the rule for "bubbling up" test results is that any
> > failures cause the parent to fail, the parent is marked as skipped if
> > _all_ subtests are skipped, and otherwise is marked as having
> > succeeded. (Reversing the last part: having a suite be marked as
> > skipped if _any_ of the subtests are skipped also makes sense, and has
> > its advantages, but anecdotally seems less common in other systems.)
>
> That really caught my attention as something to be captured in the spec.
>
> My initial response was that bubbling up results is the domain of the
> test analysis tools, not the test code.

KUnit is actually sitting in the middle. Results are bubbled up from
individual tests to the test suites in-kernel (by the common KUnit
code), as the suites are TAP tests (individual test cases being
subtests), and so need to provide results. The kunit.py script then
bubbles those results up (using the same rules) to print a summary.

> If I were writing a test analysis tool, I would want the user to have
> the ability to configure the bubble up rules. Different use cases
> would desire different rules.

I tend to agree: it'd be nice if test analysis tools could implement
different rules here. If we're using TAP subtests, though, the parent
tests do need to return a result in the test code, so either that
needs to be test-specific (if the parent test is not just a simple
union of its subtests), or it could be ignored by an analysis tool
which would follow its own rules. (In either case, it may make sense
to be able to configure a test analysis tool to always fail or mark
tests with failed or skipped subtests, even if its result is "ok", but
not vice-versa -- a test which failed would stay failed, even if all
its subtests passed.)

> My second response was to start thinking about whether the tests
> themselves should have any sort of bubble up implemented. I think
> it is a very interesting question. My current mindset is that
> each test is independent, and their is not a concept of an umbrella
> test that is the union of a set of subtests. But maybe there is
> value to umbrella tests. If there is a concept of umbrella tests
> then I think the spec should define how skip bubbles up.
>

KUnit suites are definitely that kind of "umbrella test" at the moment.

> >
> > The other really brave thing one could do to break from the TAP
> > specification would be to add a "skipped" value alongside "ok" and
> > "not ok", and get rid of the whole "SKIP" directive/comment stuff.
> > Possibly not worth the departure from the spec, but it would sidestep
> > part of the problem.
>
> I like being brave in this case. Elevating SKIP to be a peer of
> "ok" and "not ok" provides a more clear model that SKIP is a first
> class citizen. It also removes the muddled thinking that the
> current model promotes.
>
> >
> >
> > Cheers,
> > -- David
> >
>