Re: KCSAN: data-race in __alloc_file / __alloc_file

From: Marco Elver
Date: Mon Nov 11 2019 - 09:18:09 EST


On Sun, 10 Nov 2019 at 21:44, Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
>
> On Sun, Nov 10, 2019 at 11:20:53AM -0800, Linus Torvalds wrote:
> > On Sun, Nov 10, 2019 at 11:12 AM Linus Torvalds
> > <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> > >
> > > And this is where WRITE_IDEMPOTENT would make a possible difference.
> > > In particular, if we make the optimization to do the "read and only
> > > write if changed"
> >
> > It might be useful for checking too. IOW, something like KCSAN could
> > actually check that if a field has an idempotent write to it, all
> > writes always have the same value.
> >
> > Again, there's the issue with lifetime.
> >
> > Part of that is "initialization is different". Those writes would not
> > be marked idempotent, of course, and they'd write another value.
> >
> > There's also the issue of lifetime at the _end_ of the use, of course.
> > There _are_ interesting data races at the end of the lifetime, both
> > reads and writes.
> >
> > In particular, if it's a sticky flag, in order for there to not be any
> > races, all the writes have to happen with a refcount held, and the
> > final read has to happen after the final refcount is dropped (and the
> > refcounts have to have atomicity and ordering, of course). I'm not
> > sure how easy something like that is model in KSAN. Maybe it already
> > does things like that for all the other refcount stuff we do.
> >
> > But the lifetime can be problematic for other reasons too - in this
> > particular case we have a union for that sticky flag (which is used
> > under the refcount), and then when the final refcount is released we
> > read that value (thus no data race) but because of the union we will
> > now start using that field with *different* data. It becomes that RCU
> > list head instead.
> >
> > That kind of "it used to be a sticky flag, but now the lifetime of the
> > flag is over, and it's something entirely different" might be a
> > nightmare for something like KCSAN. It sounds complicated to check
> > for, but I have no idea what KCSAN really considers complicated or
> > not.
>
> But will "one size fits all" be practical and useful?
>
> For my code, I would be happy to accept a significant "false positive"
> rate to get even a probabilistic warning of other-task accesses to some
> of RCU's fields. Even if the accesses were perfect from a functional
> viewpoint, they could be problematic from a performance and scalability
> viewpoint. And for something like RCU, real bugs, even those that are
> very improbable, need to be fixed.
>
> But other code (and thus other developers and maintainers) are going to
> have different needs. For all I know, some might have good reasons to
> exclude their code from KCSAN analysis entirely.
>
> Would it make sense for KCSAN to have per-file/subsystem/whatever flags
> specifying the depth of the analysis?

Just to answer this: we already have this, and disable certain files
already. So it's an option if required. Just need maintainers to add
KCSAN_SANITIZE := n, or KCSAN_SANITIZE_file.o := n to Makefiles, and
KCSAN will simply ignore those.

FWIW we now also have a config option to "ignore repeated writes with
the same value". It may be a little overaggressive/imprecise in
filtering data races, but anything else like the super precise
analysis involving tracking lifetimes and values (and whatever else
the rules would require) is simply too complex. So, the current
solution will avoid reporting cases like the original report here
(__alloc_file), but at the cost of maybe being a little imprecise.
It's probably a reasonable trade-off, given that we have too many data
races to deal with on syzbot anyway.

Thanks,
-- Marco