Re: RFC - kernel selftest result documentation (KTAP)

From: Frank Rowand
Date: Fri Jun 19 2020 - 15:49:10 EST


On 2020-06-15 14:07, Bird, Tim wrote:
> Kees,
>
> Thanks for the great feedback. See comments inline below.
>
>> -----Original Message-----
>> From: Kees Cook <keescook@xxxxxxxxxxxx>
>>
>> On Wed, Jun 10, 2020 at 06:11:06PM +0000, Bird, Tim wrote:
>>> The kernel test result format consists of 5 major elements,
>>> 4 of which are line-based:
>>> * the output version line
>>> * the plan line
>>
>> Note: making the plan line required differs from TAP13 and TAP14. I
>> think it's the right choice, but we should be clear.
>
> Noted. In re-reading my doc, I've conflated my sections. The first
> section is "single-line", and the next section is "optional". ???
> I'll fix that.
>
> With regards to making it optional or not, I don't have a strong
> preference. The extra info seems helpful in some circumstances.
> I don't know if it's too onerous to make it a requirement or not.
> I'd prefer if it was always there (either at the beginning or the end),
> but if there is some situation where it's quite difficult to calculate,
> then it would be best not to mandate it. I can't think of any impossible
> situations at the moment.
>
>>
>>> * one or more test result lines (also called test result lines)
>>> * a possible "Bail out!" line
>>
>> "Bail out!" to be moved to "optional" elements, since it may not appear.
>> And we should clarify TAP13 and TAP14's language to say it should only
>> appear when the test is aborting without running later tests -- for this
>> reason, I think the optional "description" following "Bail out!" should
>> be made required. I.e. it must be: "Bail out! $reason"
>
> I'll make sure this is listed as optional.
> I like adding a mandatory reason.
>>
>>> optional elements:
>>> * diagnostic data
>>
>> nit: diagnostic lines (not data)
> OK.
>
>>
>>> The 5th element is diagnostic information, which is used to describe
>>> items running in the test, and possibly to explain test results.
>>> A sample test result is show below:
>>>
>>> Some other lines may be placed the test harness, and are not emitted
>>> by individual test programs:
>>> * one or more test identification lines
>>> * a possible results summary line
>>>
>>> Here is an example:
>>>
>>> TAP version 13
>>> 1..1
>>> # selftests: cpufreq: main.sh
>>> # pid 8101's current affinity mask: fff
>>> # pid 8101's new affinity mask: 1
>>> ok 1 selftests: cpufreq: main.sh
>>
>> Nit: for examples, I this should should show more than one test.
>> (Preferably, it should show all the possible cases, ok, not ok, SKIP,
>> etc.)
> Agree. I will expand this.
>
>>
>>> The output version line is: "TAP version 13"
>>>
>>> The test plan is "1..1".
>>>
>>> Element details
>>> ===============
>>>
>>> Output version line
>>> -------------------
>>> The output version line is always "TAP version 13".
>>>
>>> Although the kernel test result format has some additions
>>> to the TAP13 format, the version line reported by kselftest tests
>>> is (currently) always the exact string "TAP version 13"
>>>
>>> This is always the first line of test output.
>>>
>>> Test plan line
>>> --------------
>>> The test plan indicates the number of individual test cases intended to
>>> be executed by the test. It always starts with "1.." and is followed
>>> by the number of tests cases. In the example above, 1..1", indicates
>>> that this test reports only 1 test case.
>>>
>>> The test plan line can be placed in two locations:
>>> * the second line of test output, when the number of test cases is known
>>> in advance
>>> * as the last line of test output, when the number of test cases is not
>>> known in advance.
>>>
>>> Most often, the number of test cases is known in advance, and the test plan
>>> line appears as the second line of test output, immediately following
>>> the output version line. The number of test cases might not be known
>>> in advance if the number of tests is calculated from runtime data.
>>> In this case, the test plan line is emitted as the last line of test
>>> output.
>>
>> "... must be ..." ?
>>
>>>
>>> Test result lines
>>> -----------------
>>> The test output consists of one or more test result lines that indicate
>>> the actual results for the test. These have the format:
>>>
>>> <result> <number> <description> [<directive>] [<diagnostic data>]
>>
>> This should be:
>>
>> <result> <number> <description> [# [<directive> ][<diagnostic data>]]
>>
>>>
>>> The ''result'' must appear at the start of a line (except for when a
>>> test is nested, see below), and must consist of one of the following
>>> two phrases:
>>> * ok
>>> * not ok
>>>
>>> If the test passed, then the result is reported as "ok". If the test
>>> failed, then the result is reported as "not ok". These must be in
>>> lower case, exactly as shown.
>>>
>>> The ''number'' in the test result line represents the number of the
>>> test case being performed by the test program. This is often used by
>>> test harnesses as a unique identifier for each test case. The test
>>> number is a base-10 number, starting with 1. It should increase by
>>> one for each new test result line emitted. If possible the number
>>> for a test case should be kept the same over the lifetime of the test.
>>>
>>> The ''description'' is a short description of the test case.
>>> This can be any string of words, but should avoid using colons (':')
>>
>> Must also avoid "#".
> ok.
>>
>>> except as part of a fully qualifed test case name (see below).
>>
>> TAP13/14 makes description optional, are we making it required (I think
>> we should). There seems to be a TAP13/14 "convention" of starting
>> <description> with "- ", which I'm on the fence about it. It does make
>> parsing maybe a little easier.
>
> I would like the description to be required.

Agree, description should be required.

-Frank

> I don't have a strong opinion on the dash. I'm OK with either one (dash
> or no dash), but we should make kselftest and KUnit consistent.
>
>>
>>> Finally, it is possible to use a test directive to indicate another
>>> possible outcome for a test: that it was skipped. To report that
>>> a test case was skipped, the result line should start with the
>>> result "not ok", and the directive "# SKIP" should be placed after
>>> the test description. (Note that this deviates from the TAP13
>>> specification).
>>
>> This is what TAP14 changed, I think (i.e. directive follows description
>> now).
>>
>>>
>>> A test may be skipped for a variety of reasons, ranging for
>>> insufficient privileges to missing features or resources required
>>> to execute that test case.
>>>
>>> It is usually helpful if a diagnostic message is emitted to explain
>>> the reasons for the skip. If the message is a single line and is
>>> short, the diagnostic message may be placed after the '# SKIP'
>>> directive on the same line as the test result. However, if it is
>>> not on the test result line, it should precede the test line (see
>>> diagnostic data, next).
>>>
>>> Diagnostic data
>>> ---------------
>>> Diagnostic data is text that reports on test conditions or test
>>> operations, or that explains test results. In the kernel test
>>> result format, diagnostic data is placed on lines that start with a
>>> hash sign, followed by a space ('# ').
>>>
>>> One special format of diagnostic data is a test identification line,
>>> that has the fully qualified name of a test case. Such a test
>>> identification line marks the start of test output for a test case.
>>>
>>> In the example above, there are three lines that start with '#'
>>> which precede the test result line:
>>> # selftests: cpufreq: main.sh
>>> # pid 8101's current affinity mask: fff
>>> # pid 8101's new affinity mask: 1
>>> These are used to indicate diagnostic data for the test case
>>> 'selftests: cpufreq: main.sh'
>>>
>>> Material in comments between the identification line and the test
>>> result line are diagnostic data that can help to interpret the
>>> results of the test.
>>>
>>> The TAP specification indicates that automated test harnesses may
>>> ignore any line that is not one of the mandatory prescribed lines
>>> (that is, the output format version line, the plan line, a test
>>> result line, or a "Bail out!" line.)
>>>
>>> Bail out!
>>> ---------
>>> If a line in the test output starts with 'Bail out!', it indicates
>>> that the test was aborted for some reason. It indicates that
>>> the test is unable to proceed, and no additional tests will be
>>> performed.
>>>
>>> This can be used at the very beginning of a test, or anywhere in the
>>> middle of the test, to indicate that the test can not continue.
>>
>> I think the required syntax should be:
>>
>> Bail out! <reason>
>>
>> And to make it clear that this is optionally used to indicate an early
>> abort. (Though with a leading plan line, a parser should be able to
>> determine this on its own.)
>>
>>> --- from here on is not-yet-organized material
>>>
>>> Tip:
>>> - don't change the test plan based on skipped tests.
>>> - it is better to report that a test case was skipped, than to
>>> not report it
>>> - that is, don't adjust the number of test cases based on skipped
>>> tests
>>
>> Yes, totally.
>>
>>> Other things to mention:
>>> TAP13 elements not used:
>>> - yaml for diagnostic messages
>>> - reason: try to keep things line-based, since output from other things
>>> may be interspersed with messages from the test itself
>>
>> Agreed: the yaml extension is not sensible for our use.
>>
>>> - TODO directive
>>
>> Agreed: SKIP should cover everything TODO does.
>>
>>> KTAP Extensions beyond TAP13:
>>> - nesting
>>
>> (I would call this 'subtests')
> Sounds good. Will do.
>
>>
>>> - via indentation
>>> - indentation makes it easier for humans to read
>>
>> And allows for separable parsing of subtests.
> Agree. I'll try to work that into the doc.
>
>>
>>> - test identifier
>>> - multiple parts, separated by ':'
>>
>> This is interesting... is the goal to be able to report test status over
>> time by name?
> Yes. KernelCI and Fuego have the notions of a testcase namespace
> hierarchy. As the number of tests expands it is helpful to have
> the name-space for a sub-test be limited, just like a filesystem hierarchy
> provides scope for the names of objects (directories and files) that
> it contains.
>
>>
>>> - summary lines
>>> - can be skipped by CI systems that do their own calculations
>>
>> Right -- I think per-test summary line should be included for the humans
>> reading a single test (which may scroll off the screen).
>>
>>> Other notes:
>>> - automatic assignment of result status based on exit code
>>
>> This is, I think, a matter for the kselftest running infrastructure, not
>> the KTAP output?
> Agreed. This doesn't have anything to do with the API between
> the tests and the results processor. I'll take it out.
>>
>>> Tips:
>>> - do NOT describe the result in the test line
>>> - the test case description should be the same whether the test
>>> succeeds or fails
>>> - use diagnostic lines to describe or explain results, if this is
>>> desirable
>>
>> Right.
>>
>>> - test numbers are considered harmful
>>> - test harnesses should use the test description as the identifier
>>> - test numbers change when testcases are added or removed
>>> - which means that results can't be compared between different
>>> versions of the test
>>
>> Right.
>>
>>> - recommendations for diagnostic messages:
>>> - reason for failure
>>> - reason for skip
>>> - diagnostic data should always preceding the result line
>>> - problem: harness may emit result before test can do assessment
>>> to determine reason for result
>>> - this is what the kernel uses
>>
>> Right.
>>
>>> Differences between kernel test result format and TAP13:
>>> - in KTAP the "# SKIP" directive is placed after the description on
>>> the test result line
>>
>> Right, this is the same as TAP14, IIRC. KTAP's big deltas are the "#"
>> diagnostic lines and the subtest handling.
>>
>>> ====== start of ktap-doc-rfc.txt ======
>>> OK - that's the end of the RFC doc.
>>>
>>> Here are a few questions:
>>> - is this document desired or not?
>>
>> Yes.
> Great. I'll make this a priority to work on.
>
>>
>>> - is it too long or too short?
>>
>> Should be slightly longer: more examples.
>>
>>> - if the document is desired, where should it be placed?
>>> I assume somewhere under Documentation, and put into
>>> .rst format. Suggestions for a name and location are welcome.
>>
>> Yes Documentation/*.rst Not sure on name yet, but where do kselftest
>> docs live? :)
> Documentation/dev-tools/kselftest.rst
>
> I'll put this at: Documentation/dev-tools/test-results-format.rst
>
>>
>>> - is this document accurate?
>>> I think KUNIT does a few things differently than this description.
>>
>> Let's fix it. :)
>>
>>> - is the intent to have kunit and kselftest have the same output format?
>>> if so, then these should be rationalized.
>>
>> Yes please.
>>
>>> Finally,
>>> - Should a SKIP result be 'ok' (TAP13 spec) or 'not ok' (current kselftest practice)?
>>> See https://testanything.org/tap-version-13-specification.html
>>
>> Oh! I totally missed this. Uhm. I think "not ok" makes sense to me "it
>> did not run successfully". ... but ... Uhhh ... how do XFAIL and SKIP
>> relate? Neither SKIP nor XFAIL count toward failure, though, so both
>> should be "ok"? I guess we should change it to "ok".
>
> I have the same initial impression. In my mind, a skip is "not ok", since
> the test didn't run. However, a SKIP and should be treated differently
> from either "ok" or "not ok" by the results interpreter, so I don't think it
> matters. Originally I was averse to changing the SKIP result to "ok"
> (as suggested by Paulo Bonzini [1]), but I checked and it's pretty trivial to
> change the parser in Fuego, and it would make the kernel results format
> match the TAP13 spec. I don't see a strong reason for us to be different
> from TAP13 here.
>
> I raised this issue on our automated testing conference call last week
> (which includes people from the CKI, Fuego, KernelCI and LKFT projects), and
> so people should be chiming in if their parser will have a problem with this change.)
>
> [1] https://lkml.kernel.org/lkml/20200610154447.15826-1-pbonzini@xxxxxxxxxx/T/
>
> Thanks very much for the feedback.
> -- Tim
>