Re: Kernel Concurrency Sanitizer (KCSAN)

From: Dmitry Vyukov
Date: Fri Oct 04 2019 - 13:01:53 EST


On Fri, Oct 4, 2019 at 6:57 PM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote:
>
> On Fri, Oct 04, 2019 at 06:52:49PM +0200, Dmitry Vyukov wrote:
> > On Fri, Oct 4, 2019 at 6:49 PM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote:
> > >
> > > On Wed, Oct 02, 2019 at 09:51:58PM +0200, Marco Elver wrote:
> > > > Hi Joel,
> > > >
> > > > On Tue, 1 Oct 2019 at 23:19, Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote:
> > > > >
> > > > > On Fri, Sep 20, 2019 at 04:18:57PM +0200, Marco Elver wrote:
> > > > > > Hi all,
> > > > > >
> > > > > > We would like to share a new data-race detector for the Linux kernel:
> > > > > > Kernel Concurrency Sanitizer (KCSAN) --
> > > > > > https://github.com/google/ktsan/wiki/KCSAN (Details:
> > > > > > https://github.com/google/ktsan/blob/kcsan/Documentation/dev-tools/kcsan.rst)
> > > > > >
> > > > > > To those of you who we mentioned at LPC that we're working on a
> > > > > > watchpoint-based KTSAN inspired by DataCollider [1], this is it (we
> > > > > > renamed it to KCSAN to avoid confusion with KTSAN).
> > > > > > [1] http://usenix.org/legacy/events/osdi10/tech/full_papers/Erickson.pdf
> > > > > >
> > > > > > In the coming weeks we're planning to:
> > > > > > * Set up a syzkaller instance.
> > > > > > * Share the dashboard so that you can see the races that are found.
> > > > > > * Attempt to send fixes for some races upstream (if you find that the
> > > > > > kcsan-with-fixes branch contains an important fix, please feel free to
> > > > > > point it out and we'll prioritize that).
> > > > > >
> > > > > > There are a few open questions:
> > > > > > * The big one: most of the reported races are due to unmarked
> > > > > > accesses; prioritization or pruning of races to focus initial efforts
> > > > > > to fix races might be required. Comments on how best to proceed are
> > > > > > welcome. We're aware that these are issues that have recently received
> > > > > > attention in the context of the LKMM
> > > > > > (https://lwn.net/Articles/793253/).
> > > > > > * How/when to upstream KCSAN?
> > > > >
> > > > > Looks exciting. I think based on our discussion at LPC, you mentioned
> > > > > one way of pruning is if the compiler generated different code with _ONCE
> > > > > annotations than what would have otherwise been generated. Is that still on
> > > > > the table, for the purposing of pruning the reports?
> > > >
> > > > This might be interesting at first, but it's not entirely clear how
> > > > feasible it is. It's also dangerous, because the real issue would be
> > > > ignored. It may be that one compiler version on a particular
> > > > architecture generates the same code, but any change in compiler or
> > > > architecture and this would no longer be true. Let me know if you have
> > > > any more ideas.
> > >
> > > My thought was this technique of looking at compiler generated code can be
> > > used for prioritization of the reports. Have you tested it though? I think
> > > without testing such technique, we could not know how much of benefit (or
> > > lack thereof) there is to the issue.
> > >
> > > In fact, IIRC, the compiler generating different code with _ONCE annotation
> > > can be given as justification for patches doing such conversions.
> >
> >
> > We also should not forget about "missed mutex" races (e.g. unprotected
> > radix tree), which are much worse and higher priority than a missed
> > atomic annotation. If we look at codegen we may discard most of them
> > as non important.
>
> Sure. I was not asking to look at codegen as the only signal. But to use the
> signal for whatever it is worth.

But then we need other, stronger signals. We don't have any.
So if the codegen is the only one and it says "this is not important",
then we conclude "this is not important".