Re: KCSAN: data-race in __alloc_file / __alloc_file

From: Paul E. McKenney
Date: Sun Nov 10 2019 - 15:44:46 EST


On Sun, Nov 10, 2019 at 11:20:53AM -0800, Linus Torvalds wrote:
> On Sun, Nov 10, 2019 at 11:12 AM Linus Torvalds
> <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > And this is where WRITE_IDEMPOTENT would make a possible difference.
> > In particular, if we make the optimization to do the "read and only
> > write if changed"
>
> It might be useful for checking too. IOW, something like KCSAN could
> actually check that if a field has an idempotent write to it, all
> writes always have the same value.
>
> Again, there's the issue with lifetime.
>
> Part of that is "initialization is different". Those writes would not
> be marked idempotent, of course, and they'd write another value.
>
> There's also the issue of lifetime at the _end_ of the use, of course.
> There _are_ interesting data races at the end of the lifetime, both
> reads and writes.
>
> In particular, if it's a sticky flag, in order for there to not be any
> races, all the writes have to happen with a refcount held, and the
> final read has to happen after the final refcount is dropped (and the
> refcounts have to have atomicity and ordering, of course). I'm not
> sure how easy something like that is model in KSAN. Maybe it already
> does things like that for all the other refcount stuff we do.
>
> But the lifetime can be problematic for other reasons too - in this
> particular case we have a union for that sticky flag (which is used
> under the refcount), and then when the final refcount is released we
> read that value (thus no data race) but because of the union we will
> now start using that field with *different* data. It becomes that RCU
> list head instead.
>
> That kind of "it used to be a sticky flag, but now the lifetime of the
> flag is over, and it's something entirely different" might be a
> nightmare for something like KCSAN. It sounds complicated to check
> for, but I have no idea what KCSAN really considers complicated or
> not.

But will "one size fits all" be practical and useful?

For my code, I would be happy to accept a significant "false positive"
rate to get even a probabilistic warning of other-task accesses to some
of RCU's fields. Even if the accesses were perfect from a functional
viewpoint, they could be problematic from a performance and scalability
viewpoint. And for something like RCU, real bugs, even those that are
very improbable, need to be fixed.

But other code (and thus other developers and maintainers) are going to
have different needs. For all I know, some might have good reasons to
exclude their code from KCSAN analysis entirely.

Would it make sense for KCSAN to have per-file/subsystem/whatever flags
specifying the depth of the analysis?

Thanx, Paul