Re: Functional Coverage via RV? (was: "Learning-based Controlled Concurrency Testing")

From: Dmitry Vyukov
Date: Sat Jun 19 2021 - 07:08:36 EST


On Fri, Jun 18, 2021 at 1:26 PM Marco Elver <elver@xxxxxxxxxx> wrote:
>
> On Fri, Jun 18, 2021 at 09:58AM +0200, Daniel Bristot de Oliveira wrote:
> > On 6/17/21 1:20 PM, Marco Elver wrote:
> > > [+Daniel, just FYI. We had a discussion about "functional coverage"
> > > and fuzzing, and I've just seen your wonderful work on RV. If you have
> > > thought about fuzzing with RV and how coverage of the model impacts
> > > test generation, I'd be curious to hear.]
> >
> > One aspect of RV is that we verify the actual execution of the system instead of
> > a complete model of the system, so we depend of the testing to cover all the
> > aspects of the system <-> model.
> >
> > There is a natural relation with testing/fuzzing & friends with RV.
> >
> > > Looks like there is ongoing work on specifying models and running them
> > > along with the kernel: https://lwn.net/Articles/857862/
> > >
> > > Those models that are run alongside the kernel would have their own
> > > coverage, and since there's a mapping between real code and model, a
> > > fuzzer trying to reach new code in one or the other will ultimately
> > > improve coverage for both.
> >
> > Perfect!
> >
> > > Just wanted to document this here, because it seems quite relevant.
> > > I'm guessing that "functional coverage" would indeed be a side-effect
> > > of a good RV model?
> >
> > So, let me see if I understood the terms. Functional coverage is a way to check
> > if all the desired aspects of a code/system/subsystem/functionality were covered
> > by a set of tests?
>
> Yes, unlike code/structural coverage (which is what we have today via
> KCOV) functional coverage checks if some interesting states were reached
> (e.g. was buffer full/empty, did we observe transition a->b etc.).
>
> Functional coverage is common in hardware verification, but of course
> software verification would benefit just as much -- just haven't seen it
> used much in practice yet.
> [ Example for HW verification: https://www.chipverify.com/systemverilog/systemverilog-functional-coverage ]
>
> It still requires some creativity from the designer/developer to come up
> with suitable functional coverage. State explosion is a problem, too,
> and naturally it is impractical to capture all possible states ... after
> all, functional coverage is meant to direct the test generator/fuzzer
> into more interesting states -- we're not doing model checking after all.
>
> > If that is correct, we could use RV to:
> >
> > - create an explicit model of the states we want to cover.
> > - check if all the desired states were visited during testing.
> >
> > ?
>
> Yes, pretty much. On one hand there could be an interface to query if
> all states were covered, but I think this isn't useful out-of-the box.
> Instead, I was thinking we can simply get KCOV to help us out: my
> hypothesis is that most of this would happen automatically if dot2k's
> generated code has distinct code paths per transition.
>
> If KCOV covers the RV model (since it's executable kernel C code), then
> having distinct code paths for "state transitions" will effectively give
> us functional coverage indirectly through code coverage (via KCOV) of
> the RV model.
>
> From what I can tell this doesn't quite happen today, because
> automaton::function is a lookup table as an array. Could this just
> become a generated function with a switch statement? Because then I
> think we'd pretty much have all the ingredients we need.
>
> Then:
>
> 1. Create RV models for states of interests not covered by normal code
> coverage of code under test.
>
> 2. Enable KCOV for everything.
>
> 3. KCOV's coverage of the RV model will tell us if we reached the
> desired "functional coverage" (and can be used by e.g. syzbot to
> generate better tests without any additional changes because it
> already talks to KCOV).
>
> Thoughts?

I think there is usually already some code for any important state
transitions. E.g. I can't imagine how a socket can transition to
active/listen/shutdown/closed states w/o any code.

I see RV to be potentially more useful for the "coverage dimensions"
idea. I.e. for sockets that would be treating coverage for a socket
function X as different coverage based on the current socket state,
effectively consider (PC,state) as feedback signal.
But my concern is that we don't want to simply consider combinations
of all kernel code multiplied by all combinations of states of all RV
models. Most likely this will lead to severe feedback signal
explosion. So the question is: how do we understand that the socket
model relates only to this restricted set of code?