Re: RCU vs data_race()

From: Paul E. McKenney
Date: Sun Jun 20 2021 - 17:01:38 EST


On Sun, Jun 20, 2021 at 09:14:28PM +0200, Peter Zijlstra wrote:
> On Fri, Jun 18, 2021 at 01:48:00PM -0700, Paul E. McKenney wrote:
> > On Fri, Jun 18, 2021 at 01:26:10PM +0200, Peter Zijlstra wrote:
> > > On Fri, Jun 18, 2021 at 10:59:26AM +0200, Marco Elver wrote:
> > > > On Fri, Jun 18, 2021 at 10:24AM +0200, Peter Zijlstra wrote:
> > > > > Hi Paul,
> > > > >
> > > > > Due to a merge conflict I had to look at some recent RCU code, and I saw
> > > > > you went a little overboard with data_race(). How's something like the
> > > > > below look to you?
> > > >
> > > > I commented below. The main thing is just using the __no_kcsan function
> > > > attribute if it's only about accesses within the function (and not
> > > > also about called functions elsewhere).
> > > >
> > > > Using the attribute also improves performance slightly (not that it
> > > > matters much in a KCSAN-enabled kernel) due to no instrumentation.
> > >
> > > Aah yes ofcourse! Much better still.
> > >
> > > > > The idea being that we fundamentally don't care about data races for
> > > > > debug/error condition prints, so marking every single variable access is
> > > > > just clutter.
> > > >
> > > > Having data_race() around the pr_* helpers seems reasonable, if you
> > > > worry about future unnecessary markings that might pop up due to them.
> > >
> > > Right, so I did them on WARN and higher, figuring that those really only
> > > happen when there's smoething wrong and we're way past caring about
> > > data races. Paul has a few pr_info() users that are heavy on
> > > data_race(), but your __no_kcsan attribute suggestion helps with that.
> >
> > But there can be cases where some mutex is supposed to be held across
> > updates to one of the fields to be printed, and that mutex is held across
> > the pr_*(). In that case, we -want- KCSAN to yell if there is a data
> > race involving that field.
>
> I don't buy that argument. pr_err() (or worse) is not supposed to
> happen, ever. If it does, *that* is a far worse condition that any data
> race possibly found by kcsan.
>
> So the only way the pr_err() expression itself can lead to kcsan
> determining a data-race, if something far worse triggered the pr_err()
> itself.

Earlier, you said pr_warn(). Above, I said pr_*(). Now you say
pr_err(). But OK...

Let's take for example the pr_err() in __call_rcu(), that is, the
double-free diagnostic. A KCSAN warning on the unmarked load from
head->func could give valuable information on the whereabouts of the
other code interfering with the callback. Blanket disabling of KCSAN
across all pr_err() calls (let alone all pr_*() calls) would be the
opposite of helpful.

> > So I am not at all a fan of this change.
> >
> > But a similar technique might help elsewhere. RCU uses strict
> > KCSAN settings (something about me not wanting minor code-generation
> > performance issues turnign into full-fledged RCU concurrency bugs),
> > but invokes code that uses looser settings. This means that RCU gets
> > "false-positive" KCSAN complaints on racing calls to (for example)
> > schedule_timeout_interruptible().
> >
> > My thought is to create a rcu_schedule_timeout_interruptible(), for one
> > example, that suppresses KCSAN warnings under the assumption that they
> > will be caught by KCSAN runs on other parts of the kernel. Among other
> > things, this would also allow them to be easily adjusted as appropriate.
> >
> > Thoughts?
>
> You've lost me on the schedule thing, what?

The definition of schedule_timeout_interruptible() is in part of the
kernel that uses much looser KCSAN checking. Thus there are some
KCSAN warnings from RCU involving schedule_timeout_interruptible().
But at least some of these warnings are for conflicting writes, which
many parts of the kernel suppress KCSAN warnings for.

So a wrapper for some functions could get to clean KCSAN for RCU
without me having to push not-yet-wanted markings into other parts
of the kernel.

> All I'm saying is that RCU is turning into an unreadable mess of
> data_race(), and marking whole functions or whole statements that should
> only ever happen on actual errors *anyway*, seems to significantly
> reduce the clutter.

Sorry, but no. Between your eyes and my sanity, my sanity wins.

Thanx, Paul