Re: RCU vs data_race()

From: Peter Zijlstra
Date: Sun Jun 20 2021 - 15:16:04 EST


On Fri, Jun 18, 2021 at 01:48:00PM -0700, Paul E. McKenney wrote:
> On Fri, Jun 18, 2021 at 01:26:10PM +0200, Peter Zijlstra wrote:
> > On Fri, Jun 18, 2021 at 10:59:26AM +0200, Marco Elver wrote:
> > > On Fri, Jun 18, 2021 at 10:24AM +0200, Peter Zijlstra wrote:
> > > > Hi Paul,
> > > >
> > > > Due to a merge conflict I had to look at some recent RCU code, and I saw
> > > > you went a little overboard with data_race(). How's something like the
> > > > below look to you?
> > >
> > > I commented below. The main thing is just using the __no_kcsan function
> > > attribute if it's only about accesses within the function (and not
> > > also about called functions elsewhere).
> > >
> > > Using the attribute also improves performance slightly (not that it
> > > matters much in a KCSAN-enabled kernel) due to no instrumentation.
> >
> > Aah yes ofcourse! Much better still.
> >
> > > > The idea being that we fundamentally don't care about data races for
> > > > debug/error condition prints, so marking every single variable access is
> > > > just clutter.
> > >
> > > Having data_race() around the pr_* helpers seems reasonable, if you
> > > worry about future unnecessary markings that might pop up due to them.
> >
> > Right, so I did them on WARN and higher, figuring that those really only
> > happen when there's smoething wrong and we're way past caring about
> > data races. Paul has a few pr_info() users that are heavy on
> > data_race(), but your __no_kcsan attribute suggestion helps with that.
>
> But there can be cases where some mutex is supposed to be held across
> updates to one of the fields to be printed, and that mutex is held across
> the pr_*(). In that case, we -want- KCSAN to yell if there is a data
> race involving that field.

I don't buy that argument. pr_err() (or worse) is not supposed to
happen, ever. If it does, *that* is a far worse condition that any data
race possibly found by kcsan.

So the only way the pr_err() expression itself can lead to kcsan
determining a data-race, if something far worse triggered the pr_err()
itself.

> So I am not at all a fan of this change.
>
> But a similar technique might help elsewhere. RCU uses strict
> KCSAN settings (something about me not wanting minor code-generation
> performance issues turnign into full-fledged RCU concurrency bugs),
> but invokes code that uses looser settings. This means that RCU gets
> "false-positive" KCSAN complaints on racing calls to (for example)
> schedule_timeout_interruptible().
>
> My thought is to create a rcu_schedule_timeout_interruptible(), for one
> example, that suppresses KCSAN warnings under the assumption that they
> will be caught by KCSAN runs on other parts of the kernel. Among other
> things, this would also allow them to be easily adjusted as appropriate.
>
> Thoughts?

You've lost me on the schedule thing, what?

All I'm saying is that RCU is turning into an unreadable mess of
data_race(), and marking whole functions or whole statements that should
only ever happen on actual errors *anyway*, seems to significantly
reduce the clutter.