Re: [RFC][PATCH 0/5] arch: atomic rework

From: Torvald Riegel
Date: Tue Feb 18 2014 - 10:57:30 EST


On Mon, 2014-02-17 at 19:00 -0800, Paul E. McKenney wrote:
> On Mon, Feb 17, 2014 at 12:18:21PM -0800, Linus Torvalds wrote:
> > On Mon, Feb 17, 2014 at 11:55 AM, Torvald Riegel <triegel@xxxxxxxxxx> wrote:
> > >
> > > Which example do you have in mind here? Haven't we resolved all the
> > > debated examples, or did I miss any?
> >
> > Well, Paul seems to still think that the standard possibly allows
> > speculative writes or possibly value speculation in ways that break
> > the hardware-guaranteed orderings.
>
> It is not that I know of any specific problems, but rather that I
> know I haven't looked under all the rocks. Plus my impression from
> my few years on the committee is that the standard will be pushed to
> the limit when it comes time to add optimizations.
>
> One example that I learned about last week uses the branch-prediction
> hardware to validate value speculation. And no, I am not at all a fan
> of value speculation, in case you were curious. However, it is still
> an educational example.
>
> This is where you start:
>
> p = gp.load_explicit(memory_order_consume); /* AKA rcu_dereference() */
> do_something(p->a, p->b, p->c);
> p->d = 1;

I assume that's the source code.

> Then you leverage branch-prediction hardware as follows:
>
> p = gp.load_explicit(memory_order_consume); /* AKA rcu_dereference() */
> if (p == GUESS) {
> do_something(GUESS->a, GUESS->b, GUESS->c);
> GUESS->d = 1;
> } else {
> do_something(p->a, p->b, p->c);
> p->d = 1;
> }

I assume that this is a potential transformation by a compiler.

> The CPU's branch-prediction hardware squashes speculation in the case where
> the guess was wrong, and this prevents the speculative store to ->d from
> ever being visible. However, the then-clause breaks dependencies, which
> means that the loads -could- be speculated, so that do_something() gets
> passed pre-initialization values.
>
> Now, I hope and expect that the wording in the standard about dependency
> ordering prohibits this sort of thing. But I do not yet know for certain.

The transformation would be incorrect. p->a in the source code carries
a dependency, and as you say, the transformed code wouldn't have that
dependency any more. So the transformed code would loose ordering
constraints that it has in the virtual machine, so in the absence of
other proofs of correctness based on properties not shown in the
example, the transformed code would not result in the same behavior as
allowed by the abstract machine.

If the transformation would actually be by a programmer, then this
wouldn't do the same as the first example because mo_consume doesn't
work through the if statement.

Are there other specified concerns that you have regarding this example?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/