Re: [PATCH] kcsan: Add test suite

From: Marco Elver
Date: Tue May 05 2020 - 09:02:03 EST


On Tue, 5 May 2020 at 07:00, David Gow <davidgow@xxxxxxxxxx> wrote:
>
> On Mon, Apr 27, 2020 at 11:23 PM 'Marco Elver' via kasan-dev
> <kasan-dev@xxxxxxxxxxxxxxxx> wrote:
> >
> > On Mon, 27 Apr 2020 at 16:35, Marco Elver <elver@xxxxxxxxxx> wrote:
> > >
> > > This adds KCSAN test focusing on behaviour of the integrated runtime.
> > > Tests various race scenarios, and verifies the reports generated to
> > > console. Makes use of KUnit for test organization, and the Torture
> > > framework for test thread control.
> > >
> > > Signed-off-by: Marco Elver <elver@xxxxxxxxxx>
> > > ---
> >
> > +KUnit devs
> > We had some discussions on how to best test sanitizer runtimes, and we
> > believe that this test is what testing sanitizer runtimes should
> > roughly look like. Note that, for KCSAN there are various additional
> > complexities like multiple threads, and report generation isn't
> > entirely deterministic (need to run some number of iterations to get
> > reports, may get multiple reports, etc.).
>
> Thanks very much for writing the test. I do think that it goes a
> little outside what we'd normally expect of a unit test (notably with
> the issues around determinism and threading), but it's good to see
> KUnit being pushed in new directions a bit.
>
> The biggest issue in my mind is the possibility that the
> non-determinism of the tests could cause false positives. If we're
> trying to run as many KUnit tests as possible as part of continuous
> integration systems or as a condition for accepting patches, having
> flaky tests could be annoying. The KCSAN tests seem to break/fail
> as-is when run on single-core machines (at least, under qemu), so some
> way of documenting this as a requirement would probably be necessary,
> too.

True. Although note that we require CONFIG_KCSAN=y for this test to be
enabled, so I don't think it's a big problem for a regular CI setups.
For a KCSAN setup, I'd expect that we know that running on a
single-core system doesn't yield much interesting results regardless
of tests being run.

The non-deterministic nature of concurrent tests will never entirely
go away, but I think with the right preconditions met (at least N
CPUs, where N depends on PREEMPT_NONE, PREEMPT_VOLUNTARY or PREEMPT)
the tests here should not normally fail.

> One possibility would be to add support for "skipped" tests to KUnit
> (the TAP specification allows for it), so that the KCSAN test could
> detect cases where it's not reliable, and skip itself (leaving a note
> as to why). In the short term, though, we'd absolutely need some
> documentation around the dependencies for the test.

That would be nice. For the time being, I will add a precondition
check to test_init(), and print a warning if the test needs to be
skipped.

> (For the record, the failures I saw were all due to running under qemu
> emulating as a uniprocessor/single-core machine: with
> CONFIG_PREEMPT_VOLUNTARY, it would just hang after creating the first
> couple of threads. With CONFIG_PREEMPT, the tests completed, but the
> majority of them failed.)

Right, let me try to fix those at least. I'll send v2.

(Paul: If you prefer a separate patch rather than v2, let me know.)

> > The main thing, however, is that we want to verify the actual output
> > (or absence of it) to console. This is what the KCSAN test does using
> > the 'console' tracepoint. Could KUnit provide some generic
> > infrastructure to check console output, like is done in the test here?
> > Right now I couldn't say what the most useful generalization of this
> > would be (without it just being a wrapper around the console
> > tracepoint), because the way I've decided to capture and then match
> > console output is quite test-specific. For now we can replicate this
> > logic on a per-test basis, but it would be extremely useful if there
> > was a generic interface that KUnit could provide in future.
>
> This is something we've discussed here a couple of times as well.
> While I'll confess to being a little bit wary of having tests rely too
> heavily on console output: it risks being a bit fragile if the exact
> contents or formatting of messages change, or ends up having a lot of
> string formatting and/or parsing code in the tests. I do agree,
> though, that it probably needs to be at least a part of testing things
> like sanitizers where the ultimate goal is to produce console output.
> I'm not exactly sure how we'd implement it yet, so it's probably not
> going to happen extremely soon, but what you have here looks to me
> like a good example we can generalise as needed.

The fragility due to formatting etc. for the sanitizers is exactly
what we want, since any change in console output could be a bug. But
as you say, for other tests, it might not make much sense.

Thanks,
-- Marco