Re: [PATCH RFC] Make rcu_dereference_raw() safe for NMI etc.
From: Paul E. McKenney
Date: Tue Feb 03 2015 - 08:39:34 EST
On Tue, Feb 03, 2015 at 12:00:40PM +0100, Peter Zijlstra wrote:
> On Mon, Feb 02, 2015 at 11:55:33AM -0800, Paul E. McKenney wrote:
> > As promised/threatened on IRC.
> >
> > Thanx, Paul
> >
> > ------------------------------------------------------------------------
> >
> > rcu: Reverse rcu_dereference_check() conditions
> >
> > The rcu_dereference_check() family of primitives evaluates the RCU
> > lockdep expression first, and only then evaluates the expression passed
> > in. This works fine normally, but can potentially fail in environments
> > (such as NMI handlers) where lockdep cannot be invoked. The problem is
> > that even if the expression passed in is "1", the compiler would need to
> > prove that the RCU lockdep expression (rcu_read_lock_held(), for example)
> > is free of side effects in order to be able to elide it. Given that
> > rcu_read_lock_held() is sometimes separately compiled, the compiler cannot
> > always use this optimization.
> >
> > This commit therefore reverse the order of evaluation, so that the
> > expression passed in is evaluated first, and the RCU lockdep expression is
> > evaluated only if the passed-in expression evaluated to false, courtesy
> > of the C-language short-circuit boolean evaluation rules. This compells
> > the compiler to forego executing the RCU lockdep expression in cases
> > where the passed-in expression evaluates to "1" at compile time, so that
> > (for example) rcu_dereference_raw() can be guaranteed to execute safely
> > withing an NMI handler.
>
> My particular worry yesterday was tracing; I was looking at
> rcu_read_{,un}lock_notrace() and wondered what would happen if I used
> list_for_each_entry_rcu() under it.
>
> _If_ it would indeed do that call, we can end up in:
>
> list_entry_rcu() -> rcu_dereference_raw() -> rcu_dereference_check()
> -> rcu_read_lock_held() -> rcu_lockdep_current_cpu_online()
> -> preempt_disable()
>
> And preempt_disable() is a traceable thing -- not to mention half the
> callstack above doesn't have notrace annotations and would equally
> generate function trace events.
>
> Thereby rendering the rcu list ops unsuitable for using under _notrace()
> rcu primitives.
>
> So yes, fully agreed on this patch.
>
> Acked-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
Applied your Acked-by, thank you!
> FWIW I think I won't be needing the rcu _notrace() bits (for now), but
> it leading to this patch was worth it anyhow ;-)
No argument here! This could have been a nasty one to track down,
depending on exactly how it manifested. Much easier this way! ;-)
Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/