Re: [PATCH] documentation: Fix two-CPU control-dependency example

From: Paul E. McKenney
Date: Thu Jul 20 2017 - 12:12:14 EST


On Thu, Jul 20, 2017 at 09:55:31PM +0900, Akira Yokosawa wrote:
> On 2017/07/20 14:47, Paul E. McKenney wrote:
> > On Thu, Jul 20, 2017 at 09:31:41AM +0800, Boqun Feng wrote:
> >> On Wed, Jul 19, 2017 at 02:56:02PM -0700, Paul E. McKenney wrote:
> >>> On Thu, Jul 20, 2017 at 06:33:26AM +0900, Akira Yokosawa wrote:
> >>>> On 2017/07/20 2:43, Paul E. McKenney wrote:
> >>>>> On Mon, Jul 17, 2017 at 05:24:42PM +0900, Akira Yokosawa wrote:
> >>>>>> >From b798b9b631e237d285aa8699da00bfb8ced33bea Mon Sep 17 00:00:00 2001
> >>>>>> From: Akira Yokosawa <akiyks@xxxxxxxxx>
> >>>>>> Date: Mon, 17 Jul 2017 16:25:33 +0900
> >>>>>> Subject: [PATCH] documentation: Fix two-CPU control-dependency example
> >>>>>>
> >>>>>> In commit 5646f7acc95f ("memory-barriers: Fix control-ordering
> >>>>>> no-transitivity example"), the operator in "if" statement of
> >>>>>> the two-CPU example was modified from ">=" to ">".
> >>>>>> Now the example misses the point because there is no party
> >>>>>> who will modify "x" nor "y". So each CPU performs only the
> >>>>>> READ_ONCE().
> >>>>>>
> >>>>>> The point of this example is to use control dependency for ordering,
> >>>>>> and the WRITE_ONCE() should always be executed.
> >>>>>>
> >>>>>> So it was correct prior to the above mentioned commit. Partial
> >>>>>> revert of the commit (with context adjustments regarding other
> >>>>>> changes thereafter) restores the point.
> >>>>>>
> >>>>>> Note that the three-CPU example demonstrating the lack of
> >>>>>> transitivity stands regardless of this partial revert.
> >>>>>>
> >>>>>> Signed-off-by: Akira Yokosawa <akiyks@xxxxxxxxx>
> >>>>>
> >>>>> Hello, Akira,
> >>>>>
> >>>>> You are quite right that many compilers would generate straightforward
> >>>>> code for the code fragment below, and in that case, the assertion could
> >>>>> never trigger due to either TSO or control dependencies, depending on
> >>>>> the architecture this was running on.
> >>>>>
> >>>>> However, if the compiler was too smart for our good, it could figure
> >>>>> out that "x" and "y" only take on the values zero and one, so that
> >>>>> the "if" would always be taken. At that point, the compiler could
> >>>>> simply ignore the "if" with the result shown below.
> >>>>>
> >>>>>> ---
> >>>>>> Documentation/memory-barriers.txt | 2 +-
> >>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>>>>>
> >>>>>> diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
> >>>>>> index c4ddfcd..c1ebe99 100644
> >>>>>> --- a/Documentation/memory-barriers.txt
> >>>>>> +++ b/Documentation/memory-barriers.txt
> >>>>>> @@ -851,7 +851,7 @@ demonstrated by two related examples, with the initial values of
> >>>>>> CPU 0 CPU 1
> >>>>>> ======================= =======================
> >>>>>> r1 = READ_ONCE(x); r2 = READ_ONCE(y);
> >>>>>> - if (r1 > 0) if (r2 > 0)
> >>>>>> + if (r1 >= 0) if (r2 >= 0)
> >>>>>> WRITE_ONCE(y, 1); WRITE_ONCE(x, 1);
> >>>>>>
> >>>>>> assert(!(r1 == 1 && r2 == 1));
> >>>>>
> >>>>> Original program:
> >>>>>
> >>>>> CPU 0 CPU 1
> >>>>> ======================= =======================
> >>>>> r1 = READ_ONCE(x); r2 = READ_ONCE(y);
> >>>>> if (r1 >= 0) if (r2 >= 0)
> >>>>> WRITE_ONCE(y, 1); WRITE_ONCE(x, 1);
> >>>>>
> >>>>> assert(!(r1 == 1 && r2 == 1));
> >>>>>
> >>>>> Enthusiastically optimized program:
> >>>>>
> >>>>> CPU 0 CPU 1
> >>>>> ======================= =======================
> >>>>> r1 = READ_ONCE(x); r2 = READ_ONCE(y);
> >>>>> WRITE_ONCE(y, 1); WRITE_ONCE(x, 1);
> >>>>>
> >>>>> assert(!(r1 == 1 && r2 == 1));
> >>>>>
> >>>>> This optimized version could trigger the assertion.
> >>>>>
> >>>>> So we do need to stick with the ">" comparison.
> >>>>
> >>>> Well but,
> >>>>
> >>>> Current example:
> >>>>
> >>>> CPU 0 CPU 1
> >>>> ======================= =======================
> >>>> r1 = READ_ONCE(x); r2 = READ_ONCE(y);
> >>>> if (r1 > 0) if (r2 > 0)
> >>>> WRITE_ONCE(y, 1); WRITE_ONCE(x, 1);
> >>>>
> >>>> assert(!(r1 == 1 && r2 == 1));
> >>>>
> >>>> Such a clever compiler might be able to prove that "x" and "y"
> >>>> are never modified and end up in the following:
> >>>>
> >>
> >> Hi Akira,
> >>
> >> I wouldn't call that compiler a clever one, it's a broken one ;-)
> >>
> >> So here is the thing: READ_ONCE() is a *volatile* load, which means the
> >> compiler has to generate code that actually does a load, so the values
> >> of r1 and r2 depend on the loads, therefore, a sane compiler will not
> >> optimize the if()s out because the volatile semantics of READ_ONCE().
> >> Otherwise, I think we have a lot more to worry about than this case.
> >>
> >>>> CPU 0 CPU 1
> >>>> ======================= =======================
> >>>> r1 = READ_ONCE(x); r2 = READ_ONCE(y);
> >>>>
> >>>> assert(!(r1 == 1 && r2 == 1));
> >>>>
> >>>> This means it is impossible to describe this example in C,
> >>>> doesn't it?
> >>>
> >>> That is a matter of some debate, but it has gotten increasingly more
> >>> difficult to get C to do one's bidding over the decades.
> >>>
> >>>> What am I missing here?
> >>>
> >>> The compiler has to work harder in your example case, so it is probably
> >>> just a matter of time. In the original with ">=", all the compiler needed
> >>> to do was look at all the assignments and the initial value. In the
> >>> original, it also had to do reasoning based on control dependencies
> >>> (which are not yet part of the C standard).
> >>>
> >>> But yes, the degree to which a compiler can optimize atomics is a matter
> >>> of some debate. Here is a proposal to let the developer choose:
> >>>
> >>
> >> Hi Paul,
> >>
> >> I know the compiler could optimize atomics in very interesting ways, but
> >> this case is about volatile, so I guess our case is still fine? ;-)
> >
> > Hello, Boqun,
> >
> > When I asked that question, the answer I got was "the compiler must
> > emit the load instruction, but is under no obligation to actually use the
> > value loaded".
>
> So, it sounds like the following description found in memory-barriers.txt
> just above the example of lack of transitivity:
>
> > However, stores are not speculated. This means that ordering -is- provided
> > for load-store control dependencies, as in the following example:
> >
> > q = READ_ONCE(a);
> > if (q) {
> > WRITE_ONCE(b, 1);
> > }
> >
> > Control dependencies pair normally with other types of barriers.
> > That said, please note that neither READ_ONCE() nor WRITE_ONCE()
> > are optional! Without the READ_ONCE(), the compiler might combine the
> > load from 'a' with other loads from 'a'. Without the WRITE_ONCE(),
> > the compiler might combine the store to 'b' with other stores to 'b'.
> > Either can result in highly counterintuitive effects on ordering.
> >
> > Worse yet, if the compiler is able to prove (say) that the value of
> > variable 'a' is always non-zero, it would be well within its rights
> > to optimize the original example by eliminating the "if" statement
> > as follows:
> >
> > q = a;
> > b = 1; /* BUG: Compiler and CPU can both reorder!!! */
> >
> > So don't leave out the READ_ONCE().
>
> does not stand if the answer needs to be taken seriously.
> The READ_ONCE() is not good enough to prevent the "if" statement
> from being eliminated.
>
> Hmm... If we do need to care about such an extreme optimization,
> we should not rely on any control dependency in C source code
> at least until the compiler gets educated about control dependency.
>
> Is there some reasonable compromise?

Right now, what we have are the volatile loads such as from READ_ONCE()
and WRITE_ONCE(). My defense of them has been that they need to properly
handle MMIO. But not all compiler writers agree with me, so we should
program at least a bit defensively. Though we should probably also
be writing tests to verify that the compiler is doing the right thing,
and those litmus tests should probably include your ">=" litmus test.
But we should not encourage developers to rely on it quite yet.

My current position is that compiler writers are currently unlikely to
make their compilers do deep reasoning around the memory model, and that
they are currently unlikely to do cross-translation-unit assigned-value
analysis (though LTO might be changing that). But given a static atomic
that is visible only within a translation unit, I would expect them
to do value-range analysis. Hence my reluctance to present the ">="
variant as a pattern to follow.

Thanx, Paul


> Thanks, Akira
>
> >
> > I don't happen to like that answer, by the way. ;-)
> >
> > Thanx, Paul
> >
> >>> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0062r1.html
> >>>
> >>
> >> Great material to wake up mind in the morning! Thanks.
> >>
> >> Regards,
> >> Boqun
> >>
> >>> What are your thoughts on this?
> >>>
> >>> Thanx, Paul
> >>>
> >>>> Thanks, Akira
> >>>>
> >>>>> That said, I very much welcome critical reviews of memory-barriers.txt,
> >>>>> so please do feel free to continue doing that!
> >>>>>
> >>>>> Thanx, Paul
> >>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
> >
>