Re: rcu_read_lock lost its compiler barrier

From: Paul E. McKenney
Date: Mon Jun 03 2019 - 15:57:07 EST

On Mon, Jun 03, 2019 at 09:07:29AM -0700, Linus Torvalds wrote:
> On Mon, Jun 3, 2019 at 8:55 AM Linus Torvalds
> <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > I don't believe that it would necessarily help to turn a
> > rcu_read_lock() into a compiler barrier, because for the non-preempt
> > case rcu_read_lock() doesn't need to actually _do_ anything, and
> > anything that matters for the RCU read lock will already be a compiler
> > barrier for other reasons (ie a function call that can schedule).
> Actually, thinking a bit more about this, and trying to come up with
> special cases, I'm not at all convinced.
> Even if we don't have preemption enabled, it turns out that we *do*
> have things that can cause scheduling without being compiler barriers.
> In particular, user accesses are not necessarily full compiler
> barriers. One common pattern (x86) is
> asm volatile("call __get_user_%P4"
> which explicitly has a "asm volaile" so that it doesn't re-order wrt
> other asms (and thus other user accesses), but it does *not* have a
> "memory" clobber, because the user access doesn't actually change
> kernel memory. Not even if it's a "put_user()".
> So we've made those fairly relaxed on purpose. And they might be
> relaxed enough that they'd allow re-ordering wrt something that does a
> rcu read lock, unless the rcu read lock has some compiler barrier in
> it.
> IOW, imagine completely made up code like
> get_user(val, ptr)
> rcu_read_lock();
> WRITE_ONCE(state, 1);
> and unless the rcu lock has a barrier in it, I actually think that
> write to 'state' could migrate to *before* the get_user().
> I'm not convinced we have anything that remotely looks like the above,
> but I'm actually starting to think that yes, all RCU barriers had
> better be compiler barriers.
> Because this is very much an example of something where you don't
> necessarily need a memory barrier, but there's a code generation
> barrier needed because of local ordering requirements. The possible
> faulting behavior of "get_user()" must not migrate into the RCU
> critical region.
> Paul?

I agree that !PREEMPT rcu_read_lock() would not affect compiler code
generation, but given that get_user() is a volatile asm, isn't the
compiler already forbidden from reordering it with the volatile-casted
WRITE_ONCE() access, even if there was nothing at all between them?
Or are asms an exception to the rule that volatile executions cannot
be reordered?

> So I think the rule really should be: every single form of locking
> that has any semantic meaning at all, absolutely needs to be at least
> a compiler barrier.
> (That "any semantic meaning" weaselwording is because I suspect that
> we have locking that truly and intentionally becomes no-ops because
> it's based on things that aren't relevant in some configurations. But
> generally compiler barriers are really pretty damn cheap, even from a
> code generation standpoint, and can help make the resulting code more
> legible, so I think we should not try to aggressively remove them
> without _very_ good reasons)

We can of course put them back in, but this won't help in the typical
rcu_assign_pointer(), rcu_dereference(), and synchronize_rcu() situation
(nor do I see how it helps in Hubert's example). And in other RCU
use cases, the accesses analogous to the rcu_assign_pointer() and
rcu_dereference() (in Hubert's example, the accesses to variable "a")
really need to be READ_ONCE()/WRITE_ONCE() or stronger, correct?

Thanx, Paul