Re: Kernel Concurrency Sanitizer (KCSAN)

From: Andrea Parri
Date: Wed Oct 09 2019 - 16:17:19 EST


On Wed, Oct 09, 2019 at 09:45:50AM +0200, Dmitry Vyukov wrote:
> On Sat, Oct 5, 2019 at 6:16 AM Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
> >
> > On Sat, Oct 5, 2019 at 2:58 AM Eric Dumazet <eric.dumazet@xxxxxxxxx> wrote:
> > > > This one is tricky. What I think we need to avoid is an onslaught of
> > > > patches adding READ_ONCE/WRITE_ONCE without a concrete analysis of the
> > > > code being modified. My worry is that Joe Developer is eager to get their
> > > > first patch into the kernel, so runs this tool and starts spamming
> > > > maintainers with these things to the point that they start ignoring KCSAN
> > > > reports altogether because of the time they take up.
> > > >
> > > > I suppose one thing we could do is to require each new READ_ONCE/WRITE_ONCE
> > > > to have a comment describing the racy access, a bit like we do for memory
> > > > barriers. Another possibility would be to use atomic_t more widely if
> > > > there is genuine concurrency involved.
> > > >
> > >
> > > About READ_ONCE() and WRITE_ONCE(), we will probably need
> > >
> > > ADD_ONCE(var, value) for arches that can implement the RMW in a single instruction.
> > >
> > > WRITE_ONCE(var, var + value) does not look pretty, and increases register pressure.
> >
> > FWIW modern compilers can handle this if we tell them what we are trying to do:
> >
> > void foo(int *p, int x)
> > {
> > x += __atomic_load_n(p, __ATOMIC_RELAXED);
> > __atomic_store_n(p, x, __ATOMIC_RELAXED);
> > }
> >
> > $ clang test.c -c -O2 && objdump -d test.o
> >
> > 0000000000000000 <foo>:
> > 0: 01 37 add %esi,(%rdi)
> > 2: c3 retq
> >
> > We can have syntactic sugar on top of this of course.
>
> An interesting precedent come up in another KCSAN bug report. Namely,
> it may be reasonable for a compiler to use different optimization
> heuristics for concurrent and non-concurrent code. Consider there are
> some legal code transformations, but it's unclear if they are
> profitable or not. It may be the case that for non-concurrent code the
> expectation is that it's a profitable transformation, but for
> concurrent code it is not. So that may be another reason to
> communicate to compiler what we want to do, rather than trying to
> trick and play against each other. I've added the concrete example
> here:
> https://github.com/google/ktsan/wiki/READ_ONCE-and-WRITE_ONCE#it-may-improve-performance

Unrelated, but maybe worth pointing out/for reference: I think that
the section discussing the LKMM,

https://github.com/google/ktsan/wiki/READ_ONCE-and-WRITE_ONCE#it-is-required-for-kernel-memory-model ,

might benefit from a revision/an update, in particular, the statement
"The Kernel Memory Consistency Model requires marking of all shared
accesses" seems now quite inaccurate to me, c.f., e.g.,

d1a84ab190137 ("tools/memory-model: Add definitions of plain and marked accesses")
0031e38adf387 ("tools/memory-model: Add data-race detection")

and

https://lkml.kernel.org/r/Pine.LNX.4.44L0.1910011338240.1991-100000@xxxxxxxxxxxxxxxxxxxx .

Thanks,
Andrea