Re: Functional Coverage via RV? (was: "Learning-based Controlled Concurrency Testing")

From: Daniel Bristot de Oliveira
Date: Mon Jun 21 2021 - 04:39:58 EST


On 6/19/21 1:08 PM, Dmitry Vyukov wrote:
> On Fri, Jun 18, 2021 at 1:26 PM Marco Elver <elver@xxxxxxxxxx> wrote:
>>
>> On Fri, Jun 18, 2021 at 09:58AM +0200, Daniel Bristot de Oliveira wrote:
>>> On 6/17/21 1:20 PM, Marco Elver wrote:
>>>> [+Daniel, just FYI. We had a discussion about "functional coverage"
>>>> and fuzzing, and I've just seen your wonderful work on RV. If you have
>>>> thought about fuzzing with RV and how coverage of the model impacts
>>>> test generation, I'd be curious to hear.]
>>>
>>> One aspect of RV is that we verify the actual execution of the system instead of
>>> a complete model of the system, so we depend of the testing to cover all the
>>> aspects of the system <-> model.
>>>
>>> There is a natural relation with testing/fuzzing & friends with RV.
>>>
>>>> Looks like there is ongoing work on specifying models and running them
>>>> along with the kernel: https://lwn.net/Articles/857862/
>>>>
>>>> Those models that are run alongside the kernel would have their own
>>>> coverage, and since there's a mapping between real code and model, a
>>>> fuzzer trying to reach new code in one or the other will ultimately
>>>> improve coverage for both.
>>>
>>> Perfect!
>>>
>>>> Just wanted to document this here, because it seems quite relevant.
>>>> I'm guessing that "functional coverage" would indeed be a side-effect
>>>> of a good RV model?
>>>
>>> So, let me see if I understood the terms. Functional coverage is a way to check
>>> if all the desired aspects of a code/system/subsystem/functionality were covered
>>> by a set of tests?
>>
>> Yes, unlike code/structural coverage (which is what we have today via
>> KCOV) functional coverage checks if some interesting states were reached
>> (e.g. was buffer full/empty, did we observe transition a->b etc.).
>>
>> Functional coverage is common in hardware verification, but of course
>> software verification would benefit just as much -- just haven't seen it
>> used much in practice yet.
>> [ Example for HW verification: https://www.chipverify.com/systemverilog/systemverilog-functional-coverage ]
>>
>> It still requires some creativity from the designer/developer to come up
>> with suitable functional coverage. State explosion is a problem, too,
>> and naturally it is impractical to capture all possible states ... after
>> all, functional coverage is meant to direct the test generator/fuzzer
>> into more interesting states -- we're not doing model checking after all.
>>
>>> If that is correct, we could use RV to:
>>>
>>> - create an explicit model of the states we want to cover.
>>> - check if all the desired states were visited during testing.
>>>
>>> ?
>>
>> Yes, pretty much. On one hand there could be an interface to query if
>> all states were covered, but I think this isn't useful out-of-the box.
>> Instead, I was thinking we can simply get KCOV to help us out: my
>> hypothesis is that most of this would happen automatically if dot2k's
>> generated code has distinct code paths per transition.
>>
>> If KCOV covers the RV model (since it's executable kernel C code), then
>> having distinct code paths for "state transitions" will effectively give
>> us functional coverage indirectly through code coverage (via KCOV) of
>> the RV model.
>>
>> From what I can tell this doesn't quite happen today, because
>> automaton::function is a lookup table as an array. Could this just
>> become a generated function with a switch statement? Because then I
>> think we'd pretty much have all the ingredients we need.
>>
>> Then:
>>
>> 1. Create RV models for states of interests not covered by normal code
>> coverage of code under test.
>>
>> 2. Enable KCOV for everything.
>>
>> 3. KCOV's coverage of the RV model will tell us if we reached the
>> desired "functional coverage" (and can be used by e.g. syzbot to
>> generate better tests without any additional changes because it
>> already talks to KCOV).
>>
>> Thoughts?
>
> I think there is usually already some code for any important state
> transitions. E.g. I can't imagine how a socket can transition to
> active/listen/shutdown/closed states w/o any code.

makes sense...

> I see RV to be potentially more useful for the "coverage dimensions"
> idea. I.e. for sockets that would be treating coverage for a socket
> function X as different coverage based on the current socket state,
> effectively consider (PC,state) as feedback signal.

How can RV subsystem talk with KCOV?

> But my concern is that we don't want to simply consider combinations
> of all kernel code multiplied by all combinations of states of all RV
> models.

I agree! Also because RV monitors will generally monitor an specific part of the
code (with exceptions for models like the preemption one).

Most likely this will lead to severe feedback signal
> explosion.So the question is: how do we understand that the socket
> model relates only to this restricted set of code?
>

Should we annotate a model, saying which subsystem it monitors/verify?

-- Daniel