Re: [RFC][PATCH 0/5] arch: atomic rework

From: Linus Torvalds
Date: Sat Feb 22 2014 - 22:50:48 EST


On Sat, Feb 22, 2014 at 4:39 PM, Paul E. McKenney
<paulmck@xxxxxxxxxxxxxxxxxx> wrote:
>
> Agreed, by far the most frequent use is "->" to dereference and assignment
> to store into a local variable. The other operations where the kernel
> expects ordering to be maintained are:
>
> o Bitwise "&" to strip off low-order bits. The FIB tree does
> this, for example in fib_table_lookup() in net/ipv4/fib_trie.c.
> The low-order bit is used to distinguish internal nodes from
> leaves -- nodes and leaves are different types of structures.
> (There are a few others.)

Note that this is very much outside the scope of the C standard,
regardless of 'consume' or not.

We'll always do things outside the standard, so I wouldn't worry.

> o Uses "?:" to substitute defaults in case of NULL pointers,
> but ordering must be maintained in the non-default case.
> Most, perhaps all, of these could be converted to "if" should
> "?:" prove problematic.

Note that this doesn't actually affect the restrict/ordering rule in
theory: "?:" isn't special according to those rules. The rules are
fairly simple: we guarantee ordering only to the object that the
pointer points to, and even that guarantee goes out the window if
there is some *other* way to reach the object.

?: is not really relevant, except in the sense that *any* expression
that ends up pointing to outside the object will lose the ordering
guarantee. ?: can be one such expression, but so can "p-p" or anything
like that.

And in *practice*, the only thing that needs to be sure to generate
special code is alpha, and there you'd just add the "rmb" after the
load. That is sufficient to fulfill the guarantees.

On ARM and powerpc, the compiler obviously has to guarantee that it
doesn't do value-speculation on the result, but again, that never
really had anything to do with the whole "carries a dependency", it is
really all about the fact that in order to guarantee the ordering, the
compiler mustn't generate that magical aliased pointer value. But if
the aliased pointer value comes from the *source* code, all bets are
off.

Now, even on alpha, the compiler can obviously move that "rmb" around.
For example, if there is a conditional after the
"atomic_read(mo_consume)", and the compiler can tell that the pointer
that got read by mo_consume is dead along one branch, then the
compiler can move the "rmb" to only exist in the other branch. Why?
Because we inherently guarantee only the order to any accesses to the
object the pointer pointed to, and that the pointer that got loaded is
the *only* way to get to that object (in this context), so if the
value is dead, then so is the ordering.

In fact, even if the value is *not* dead, but it is NULL, the compiler
can validly say "the NULL pointer cannot point to any object, so I
don't have to guarantee any serialization". So code like this (writing
alpha assembly, since in practice only alpha will ever care):

ptr = atomic_read(pp, mo_consume);
if (ptr) {
... do something with ptr ..
}
return ptr;

can validly be translated to:

ldq $1,0($2)
beq $1,branch-over
rmb
.. the do-something code using register $1 ..

because the compiler knows that a NULL pointer cannot be dereferenced,
so it can decide to put the rmb in the non-NULL path - even though the
pointer value is still *live* in the other branch (well, the liveness
of a constant value is somewhat debatable, but you get the idea), and
may be used by the caller (but since it is NULL, the "use" can not
include accessing any object, only really testing)

So note how this is actually very different from the "carries
dependency" rule. It's simpler, and it allows much more natural
optimizations.

> o Addition and subtraction to adjust both pointers to and indexes
> into RCU-protected arrays. There are not that many indexes,
> and they could be converted to pointers, but the addition and
> subtraction looks necessary in a some cases.

Addition and subtraction is fine, as long as they stay within the same
object/array.

And realistically, people violate the whole C pointer "same object"
rule all the time. Any time you implement a raw memory allocator,
you'll violate the C standard and you *will* basically be depending on
architecture-specific behavior. So everybody knows that the C "pointer
arithmetic has to stay within the object" is really a fairly made-up
but convenient shorthand for "sure, we know you'll do tricks on
pointer values, but they won't be portable and you may have to take
particular machine representations into account".

> o Array indexing. The value from rcu_dereference() is used both
> before and inside the "[]", interestingly enough.

Well, in the C sense, or in the actual "integer index" sense? Because
technically, a[b] is nothing but *(a+b), so "inside" the "[]" is
strictly speaking meaningless. Inside and outside are just syntactic
sugar.

That said, I agree that the way I phrased things really limits things
to *just* pointers, and if you want to do RCU on integer values (that
get turned into pointers some other way), that would be outside the
spec.

But in practice, the code generation will *work* for non-pointers too
(the exception might be if the compiler actually does the above "NULL
is special, so I know I don't need to order wrt it". Exactly the way
that people do arithmetic operations on pointers that aren't really
covered by the standard (ie arithmetic 'and' to align them etc), the
code still *works*. It may be outside the standard, but everybody does
it, and one of the reasons C is so powerful is exactly that you *can*
do things like that. They won't be portable, and you have to know what
you are doing, but they don't stop working just because they aren't
covered by the standard.

>> - the "restricted pointer" part means that the compiler does not need
>> to worry about serialization to that object through other possible
>> pointers - we have basically promised that the *only* pointer to that
>> object comes from the mo_consume. So that part makes it clear that the
>> "consume" ordering really only is valid wrt that particular pointer
>> load.
>
> That could work, though there are some cases where a multi-linked
> structure is made visible using a single rcu_assign_pointer(), and
> rcu_dereference() is used only for the pointer leading to that
> multi-linked structure, not for the pointers among the elements
> making up that structure. One way to handle this would be to
> require rcu_dereference() to be used within the structure an well
> as upon first traversal to the structure.

I don't see that you would need to protect anything but the first
read, so I think you need rcu_dereference() only on the initial
pointer access.

On architectures where following a pointer is sufficient ordering,
we're fine. On alpha (and, if anybody ever makes that horrid mistake
again), the 'rmb' after the first access will guarantee that all the
later accesses will see the new list.

So basically, 'consume' will end up inserting a barrier (if necessary)
before following the pointer for the first time, but once that barrier
has been inserted, we are now fully ordered wrt the releasing store,
so we're done. No need to order *all* the accesses (even off the same
base pointer), only the first one.

> But now it is time for some bullshit syntax for the RCU-protected arrays
> in the Linux kernel:
>
> p = atomic_load_explicit(gp, memory_order_consume);
> r1 = *p; /* Ordering maintained. */
> r2 = p[5]; /* Ordering maintained? */

Oh yes. It's in the same object. If it wasn't, then "p[5]" itself
would be meaningless and outside the scope of the standard to begin
with.

> r3 = p + 5; /* Ordering maintained? */
> n = get_an_index();
> r4 = p[n]; /* Ordering maintained? */

Yes. By definition. And again, for the very same reason. You would
violate *other* parts of the C standard before you violated the
suggested rule.

Also, remember: a lot of this is about "legalistic guarantees". In
practice, just the fact that a data chain *exists* is guarantee
enough, so from a code generation standpoint, NONE OF THIS MATTERS.

The exception is alpha, where a "mo_consume" load basically needs to
be generated with a "rmb" following it (see above how you can relax
that a _bit_, but not much). Trivial.

So code generation is actually *trivial*. Ordering is maintained
*automatically*, with no real effort on the side of the compiler.

The only issues really are:

- the current C standard language implies *too much* ordering. The
whole "carries dependency" is broken. The practical example - even
without using "?:" or any conditionals or any function calls - is just
that

ptr = atomic_read(pp, mo_consume);
value = array[ptr-ptr];

and there really isn't any sane ordering there. But the current
standard clearly says there is. The current standard is wrong.

- my trick of saying that it is only ordered wrt accesses to the
object the pointer points to gets rid of that whole bogus false
dependency.

- the "restricted pointer" thing is just legalistic crap, and has no
actual meaning for code generation, it is _literally_ just that if the
programmer does value speculation by hand:

ptr = atomic_read(pp, mo_consume);
value = object.val;
if (ptr != &object)
value = ptr->val;

then in this situation the compiler does *not* need to worry about
the ordering to the "object.val" access, because it was gotten through
an alias.

So that "restrict" thing is really just a way to say "the ordering
only exists through that single pointer". That's not really what
restrict was _designed_ for, but to quote the standard:

"An object that is accessed through a restrict-qualiïed pointer has a
special association
with that pointer. This association, deïned in 6.7.3.1 below,
requires that all accesses to
that object use, directly or indirectly, the value of that particular pointer"

that's not the *definition* of "restrict", and quite frankly, the
actual language that talks about "restrict" really talks about being
able to know that nobody *modifies* the value through another pointer,
so I'm clearly stretching things a bit. But anybody reading the above
quote from the standard hopefully agrees that I'm not stretching it
all that much.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/