Re: [RFC][PATCH 0/5] arch: atomic rework

From: Torvald Riegel
Date: Thu Feb 13 2014 - 23:43:57 EST


On Thu, 2014-02-13 at 18:01 -0800, Paul E. McKenney wrote:
> On Thu, Feb 13, 2014 at 12:03:57PM -0800, Torvald Riegel wrote:
> > On Wed, 2014-02-12 at 16:23 -0800, Paul E. McKenney wrote:
> > > On Wed, Feb 12, 2014 at 12:22:53PM -0800, Linus Torvalds wrote:
> > > > On Wed, Feb 12, 2014 at 10:07 AM, Paul E. McKenney
> > > > <paulmck@xxxxxxxxxxxxxxxxxx> wrote:
> > > > >
> > > > > Us Linux-kernel hackers will often need to use volatile semantics in
> > > > > combination with C11 atomics in most cases. The C11 atomics do cover
> > > > > some of the reasons we currently use ACCESS_ONCE(), but not all of them --
> > > > > in particular, it allows load/store merging.
> > > >
> > > > I really disagree with the "will need to use volatile".
> > > >
> > > > We should never need to use volatile (outside of whatever MMIO we do
> > > > using C) if C11 defines atomics correctly.
> > > >
> > > > Allowing load/store merging is *fine*. All sane CPU's do that anyway -
> > > > it's called a cache - and there's no actual reason to think that
> > > > "ACCESS_ONCE()" has to mean our current "volatile".
> > > >
> > > > Now, it's possible that the C standards simply get atomics _wrong_, so
> > > > that they create visible semantics that are different from what a CPU
> > > > cache already does, but that's a plain bug in the standard if so.
> > > >
> > > > But merging loads and stores is fine. And I *guarantee* it is fine,
> > > > exactly because CPU's already do it, so claiming that the compiler
> > > > couldn't do it is just insanity.
> > >
> > > Agreed, both CPUs and compilers can merge loads and stores. But CPUs
> > > normally get their stores pushed through the store buffer in reasonable
> > > time, and CPUs also use things like invalidations to ensure that a
> > > store is seen in reasonable time by readers. Compilers don't always
> > > have these two properties, so we do need to be more careful of load
> > > and store merging by compilers.
> >
> > The standard's _wording_ is a little vague about forward-progress
> > guarantees, but I believe the vast majority of the people involved do
> > want compilers to not prevent forward progress. There is of course a
> > difference whether a compiler establishes _eventual_ forward progress in
> > the sense of after 10 years or forward progress in a small bounded
> > interval of time, but this is a QoI issue, and good compilers won't want
> > to introduce unnecessary latencies. I believe that it is fine if the
> > standard merely talks about eventual forward progress.
>
> The compiler will need to earn my trust on this one. ;-)
>
> > > > Now, there are things that are *not* fine, like speculative stores
> > > > that could be visible to other threads. Those are *bugs* (either in
> > > > the compiler or in the standard), and anybody who claims otherwise is
> > > > not worth discussing with.
> > >
> > > And as near as I can tell, volatile semantics are required in C11 to
> > > avoid speculative stores. I might be wrong about this, and hope that
> > > I am wrong. But I am currently not seeing it in the current standard.
> > > (Though I expect that most compilers would avoid speculating stores,
> > > especially in the near term.
> >
> > This really depends on how we define speculative stores. The memory
> > model is absolutely clear that programs have to behave as if executed by
> > the virtual machine, and that rules out speculative stores to volatiles
> > and other locations. Under certain circumstances, there will be
> > "speculative" stores in the sense that they will happen at different
> > times as if you had a trivial implementation of the abstract machine.
> > But to be allowed to do that, the compiler has to prove that such a
> > transformation still fulfills the as-if rule.
>
> Agreed, although the as-if rule would ignore control dependencies, since
> these are not yet part of the standard (as you in fact note below).
> I nevertheless consider myself at least somewhat reassured that current
> C11 won't speculate stores. My remaining concerns involve the compiler
> proving to itself that a given branch is always taken, thus motivating
> it to optimize the branch away -- though this is more properly a
> control-dependency concern.
>
> > IOW, the abstract machine is what currently defines disallowed
> > speculative stores. If you want to put *further* constraints on what
> > implementations are allowed to do, I suppose it is best to talk about
> > those and see how we can add rules that allow programmers to express
> > those constraints. For example, control dependencies might be such a
> > case. I don't have a specific suggestion -- maybe the control
> > dependencies are best tackled similar to consume dependencies (even
> > though we don't have a good solution for those yets). But using
> > volatile accesses for that seems to be a big hammer, or even the wrong
> > one.
>
> In current compilers, the two hammers we have are volatile and barrier().
> But yes, it would be good to have something more focused. One option
> would be to propose memory_order_control loads to see how loudly the
> committee screams. One use case might be as follows:
>
> if (atomic_load(x, memory_order_control))
> atomic_store(y, memory_order_relaxed);
>
> This could also be written:
>
> r1 = atomic_load(x, memory_order_control);
> if (r1)
> atomic_store(y, memory_order_relaxed);
>
> A branch depending on the memory_order_control load could not be optimized
> out, though I suppose that the compiler could substitute a memory-barrier
> instruction for the branch. Seems like it would take a very large number
> of branches to equal the overhead of the memory barrier, though.
>
> Another option would be to flag the conditional expression, prohibiting
> the compiler from optimizing out any conditional branches. Perhaps
> something like this:
>
> r1 = atomic_load(x, memory_order_control);
> if (control_dependency(r1))
> atomic_store(y, memory_order_relaxed);

That's the one I had in mind and talked to you about earlier today. My
gut feeling is that this is preferably over the other because it "marks"
the if-statement, so the compiler knows exactly which branches matter.
I'm not sure one would need the other memory order for that, if indeed
all you want is relaxed -> branch -> relaxed. But maybe there are
corner cases (see the weaker-than-relaxed discussion in SG1 today).


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/