Re: [RFC] KTAP spec v2: prefix to KTAP data

From: David Gow
Date: Sat May 14 2022 - 04:26:42 EST


On Fri, May 13, 2022 at 1:26 AM Frank Rowand <frowand.list@xxxxxxxxx> wrote:
>
> On 5/12/22 10:25, Daniel Latypov wrote:
> > On Wed, May 11, 2022 at 11:01 PM Frank Rowand <frowand.list@xxxxxxxxx> wrote:
> >> ================================================================================
> >> #### discussion notes:
> >>
> >> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> >> PRO: minimally invasive to specification.
> >>
> >> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> >> CON:
> >>
> >> KTAP does not include any mechanism to describe the value of <prefix string>
> >> to test harnesses and test output processing programs. The test output
> >> processing programs must infer the value of <prefix string> by detecting
> >> the <prefix string> in the "Version lines".
> >>
> >> The detection of a "Version lines" might be a match of the regex:
> >>
> >> "^.*KTAP version 2$"
> >>
> >> This risks falsely detecting a "Version lines", but the risk is small???
> >
> > Agree this is a risk and also think it's probably small, but it's hard to say.
> > I think the $ anchoring the regex is probably safe enough.
> >
> > As noted earlier, this tracks with what kunit.py already does.
> > That was necessitated by dynamic prefixes such as timestamps, etc.f
>
> That is a good observation. I nearly always have the prefix timestamp
> on my console output, and thus remove the timestamp with a regex when
> processing the data.
>

Yeah: the use of a prefix length for timestamps (and similar 'dynamic
prefixes') works really well as a way to automatically find KTAP
output in the kernel log.
It's definitely less useful as a way of disambiguating several
(potentially interleaved) KTAP streams, though.

Personally, my conception of KTAP is that the prefix is not a part of
the format, but a way (possibly one we should recommend in the spec)
to isolate KTAP output from (e.g.) the kernel log.
So, in theory, the processing pipeline for log -> result is:
[dmesg] ---<remove prefix>--> [KTAP] ---<parse>---> [results]

I wouldn't object to having the prefix be a part of KTAP if it were
particularly useful to anyone, though, and definitely agree we should
note ways to do it.

Either way, we do end up with two (slightly conflicting) uses of prefixes:
a) To isolate tests from a log.
b) To allow multiple concurrent tests.

For a), Alternative 1 seems obviously correct to me, since whatever
module is being used might not know what random prefixes are being
added to its log lines.
For b), either would work, but we do potentially get conflicts with a)
if we're trying to do both. So maybe alternative 2 makes sense for
that.

>
> > So I think this is probably a fine risk to take.
> >
> > I imagine we could add constraints of prefix string, e.g. must have []
> > around it, etc. if we want to try and minimize this risk.
> > But I don't know if it's necessarily worth it, given what we know right now.
> >
> > Along those lines, I think I like this approach (Alternative 1) more
> > than Alternative 2/2b.

I agree, though note that we might need both if we want two separate
types of prefix as mentioned above.

> > I'm not sure we need a structured way to specify metadata in KTAP yet?
> > The prefix seems like a reasonable candidate, but do others have ideas
> > of other bits of metadata we'd want to be able to declare?
> >

That's the important takeaway from Alternative 2: some generic way of
having metadata could be quite handy. (Of them, Alternative 2b feels a
little more like a hack than I'd prefer, albeit a clever one. I'd
prefer a cleaner "configuration info line" personally, though if
reusing the test results for test 0 makes parsing significantly easier
for others, it may nevertheless be worth doing.)

As for other pieces of metadata, there are all sorts of useful
information about the test which could be useful to store alongside
the results (what tool generated it, kernel version, etc), though
those might be more useful when saving results elsewhere than embedded
into the kernel output.

Equally, if we extended these configuration/info lines to individual
test/subtests, things like test size, originating module, etc could
potentially be stored. Suite/test init functions could potentially add
useful metadata, too (e.g., number of cores/threads for kcsan, etc).

None of those are, I think, quite compelling enough individually to
add a config system, though. If we already had one for the prefix, and
other things which are necessary for parsing, maybe...

-- David