Re: RCU vs data_race()

From: Peter Zijlstra
Date: Tue Jul 06 2021 - 04:37:56 EST


On Mon, Jun 21, 2021 at 06:37:57AM -0700, Paul E. McKenney wrote:
> Because the static inline functions end up in RCU's tranlation units,
> and they do write to other state. See for example the line marked with
> the asterisk at the end of this email, which is apparently between
> __run_timers() and rcu_torture_reader(). But gdb says that this is
> really between this statement in __run_timers():
>
> base->running_timer = NULL;

(I'm finding that in expire_timers(), not in __run_timers(), and both
are in kernel/time/timer.c)

> And this statement in try_to_del_timer_sync():
>
> if (base->running_timer != timer)
>
> Because of inline functions, running KCSAN on RCU can detect races in
> other subsystems. In this case, I could argue for a READ_ONCE() on the
> "if" condition, but last I knew, the rules for timers are that C-language
> writes do not participate in data races. So maybe I should add that
> READ_ONCE(), but running KCSAN on rcutorture would still complain.

That code is correct as is, AFAICT, so READ_ONCE() is not the right
annotation. Also, READ_ONCE() would not actively help the situation in
any case, and arguably make the code worse and more confusing.


What happens here is that we read base->running_timer while holding
base->lock. We set base->running_timer to !0 holding that same
base->lock from the timer IRQ context. We clear base->running_timer
*without* base->lock, from the timer IRQ context.

So yes, there's a race between the locked load of base->running_timer
and the timer IRQ on another CPU clearing it.

But since the comparison is an equality test and the only purpose of the
clear is to destroy that equality, any partial clear serves that
purpose.

Now, a partial clear could create a false positive match for another
timer, which in turn could make try_to_del_timer_sync() fail spuriously
I suppose, but any caller needs to be able to deal with failure of that
function, so no harm done there.


*HOWEVER* timer_curr_running() violates all that.. but that looks like
it's only ever used as a diagnostic, so that should be fine too.


Anyway, AFAIU the problem is on the WRITE side, so any READ_ONCE() will
not ever fix anything here. If we want to get rid of the possible
spurious fail in try_to_del_timer_sync() we need to consistently
WRITE/READ_ONCE() all of base->running_timer, if we don't care, we need
a data_race() annotation somewhere, although I'm not currently seeing
that inlining you mention that's needed for this problem to manifest in
the first place.