Re: RCU vs data_race()

From: Paul E. McKenney
Date: Tue Jul 06 2021 - 10:54:32 EST


On Tue, Jul 06, 2021 at 10:37:49AM +0200, Peter Zijlstra wrote:
> On Mon, Jun 21, 2021 at 06:37:57AM -0700, Paul E. McKenney wrote:
> > Because the static inline functions end up in RCU's tranlation units,
> > and they do write to other state. See for example the line marked with
> > the asterisk at the end of this email, which is apparently between
> > __run_timers() and rcu_torture_reader(). But gdb says that this is
> > really between this statement in __run_timers():
> >
> > base->running_timer = NULL;
>
> (I'm finding that in expire_timers(), not in __run_timers(), and both
> are in kernel/time/timer.c)
>
> > And this statement in try_to_del_timer_sync():
> >
> > if (base->running_timer != timer)
> >
> > Because of inline functions, running KCSAN on RCU can detect races in
> > other subsystems. In this case, I could argue for a READ_ONCE() on the
> > "if" condition, but last I knew, the rules for timers are that C-language
> > writes do not participate in data races. So maybe I should add that
> > READ_ONCE(), but running KCSAN on rcutorture would still complain.
>
> That code is correct as is, AFAICT, so READ_ONCE() is not the right
> annotation. Also, READ_ONCE() would not actively help the situation in
> any case, and arguably make the code worse and more confusing.
>
>
> What happens here is that we read base->running_timer while holding
> base->lock. We set base->running_timer to !0 holding that same
> base->lock from the timer IRQ context. We clear base->running_timer
> *without* base->lock, from the timer IRQ context.
>
> So yes, there's a race between the locked load of base->running_timer
> and the timer IRQ on another CPU clearing it.
>
> But since the comparison is an equality test and the only purpose of the
> clear is to destroy that equality, any partial clear serves that
> purpose.
>
> Now, a partial clear could create a false positive match for another
> timer, which in turn could make try_to_del_timer_sync() fail spuriously
> I suppose, but any caller needs to be able to deal with failure of that
> function, so no harm done there.
>
>
> *HOWEVER* timer_curr_running() violates all that.. but that looks like
> it's only ever used as a diagnostic, so that should be fine too.

Maybe a data_race(), then? Except that in current mainline (as opposed
to v5.13) timer_curr_running() seems to have been inlined.

> Anyway, AFAIU the problem is on the WRITE side, so any READ_ONCE() will
> not ever fix anything here. If we want to get rid of the possible
> spurious fail in try_to_del_timer_sync() we need to consistently
> WRITE/READ_ONCE() all of base->running_timer, if we don't care, we need
> a data_race() annotation somewhere, although I'm not currently seeing
> that inlining you mention that's needed for this problem to manifest in
> the first place.

Coming back to my original question, is my best strategy for RCU
to create data_race()-wrapped variants of the schedule_timeout*()
functions?

Thanx, Paul