Re: [PATCH] documentation: Fix two-CPU control-dependency example

From: Paul E. McKenney
Date: Sun Jul 23 2017 - 00:43:31 EST

Next message: kbuild test robot: "include/linux/kernel.h:860:32: error: dereferencing pointer to incomplete type 'struct clock_event_device'"
Previous message: Joel Fernandes: "Re: [PATCH v6] cpufreq: schedutil: Make iowait boost more energy efficient"
In reply to: Akira Yokosawa: "Re: [PATCH] documentation: Fix two-CPU control-dependency example"
Next in thread: Boqun Feng: "Re: [PATCH] documentation: Fix two-CPU control-dependency example"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Sat, Jul 22, 2017 at 08:38:57AM +0900, Akira Yokosawa wrote:
> On 2017/07/20 16:07:14 -0700, Paul E. McKenney wrote:
> > On Fri, Jul 21, 2017 at 07:52:03AM +0900, Akira Yokosawa wrote:
> >> On 2017/07/20 14:42:34 -0700, Paul E. McKenney wrote:
> [...]
> >>> For the compilers I know about at the present time, yes.
> >>
> >> So if I respin the patch with the extern, would you still feel reluctant?
> >
> > Yes, because I am not seeing how this change helps. What is this telling
> > the reader that the original did not, and how does it help the reader
> > generate good concurrent code?
>
> Well, what bothers me in the ">" version of two-CPU example is the
> explanation in memory-barriers.txt that follows:
>
> > These two examples are the LB and WWC litmus tests from this paper:
> > http://www.cl.cam.ac.uk/users/pes20/ppc-supplemental/test6.pdf and this
> > site: https://www.cl.cam.ac.uk/~pes20/ppcmem/index.html.
>
> I'm wondering if calling the ">" version as an "LB" litmus test is correct.
> Because it always results in "r1 == 0 && r2 == 0", 100%.

As it should, because nothing can become non-zero unless something was
already non-zero. It is possible to create similarly single-outcome
tests with address dependencies.

But yes, converting to ">=" makes the stores unconditional, and thus
allows more outcomes. Perhaps we need both? Another approach is to
write a second value in an "else" statement, keeping the original ">".

I agree that some of these examples can be a bit hard on one's intuition,
but the plain fact is that when it comes to memory models, one's intuition
is not always one's friend.

> An LB litmus test with full memory barriers would be:
>
> CPU 0 CPU 1
> ======================= =======================
> r1 = READ_ONCE(x); r2 = READ_ONCE(y);
> smp_mb(); smp_mb();
> WRITE_ONCE(y, 1); WRITE_ONCE(x, 1);
>
> assert(!(r1 == 1 && r2 == 1));
>
> and this will result in either of
>
> r1 == 0 && r2 == 0
> r1 == 0 && r2 == 1
> r1 == 1 && r2 == 0
>
> but never "r1 == 1 && r2 == 1".

Agreed, because unlike the control-dependency example, the WRITE_ONCE()s
happen unconditionally.

> The difference in the behavior distracts me in reading this part
> of memory-barriers.txt.

Then it likely needs more explanation.

> Your priority seemed to be in reducing the chance of the "if" statement
> to be optimized away. So I suggested to use "extern" as a compromise.

If the various tools accept the "extern", this might not be a bad thing
to do.

But what this really means is that I need to take another tilt at
the "volatile" windmill in the committee.

> Another way would be to express the ">=" version in a pseudo-asm form.
>
> CPU 0 CPU 1
> ======================= =======================
> r1 = LOAD x r2 = LOAD y
> if (r1 >= 0) if (r2 >= 0)
> STORE y = 1 STORE x = 1
>
> assert(!(r1 == 1 && r2 == 1));
>
> This should eliminate any concern of compiler optimization.
> In this final part of CONTROL DEPENDENCIES section, separating the
> problem of optimization and transitivity would clarify the point
> (at least for me).

The problem is that people really do use C-language control dependencies
in the Linux kernel, so we need to describe them. Maybe someday it
will be necessary to convert them to asm, but I am hoping that we can
avoid that.

> Thoughts?

My hope is that the memory model can help here, but that will in any
case take time.

Thanx, Paul

> Regards, Akira
>
> >
> > Thanx, Paul
> >
> >> Regards, Akira
> >>
> >>>
> >>> The underlying problem is that the standard says almost nothing about what
> >>> volatile does. I usually argue that it was intended to be used for MMIO,
> >>> so any optimization that would prevent a device driver from working must
> >>> be prohibited on volatiles. A lot of people really don't like volatile,
> >>> and would like to eliminate it, and these people also aren't very happy
> >>> about any attempt to better define volatile.
> >>>
> >>> Keeps things entertaining. ;-)
> >>>
> >>> Thanx, Paul
> >>>
>

Next message: kbuild test robot: "include/linux/kernel.h:860:32: error: dereferencing pointer to incomplete type 'struct clock_event_device'"
Previous message: Joel Fernandes: "Re: [PATCH v6] cpufreq: schedutil: Make iowait boost more energy efficient"
In reply to: Akira Yokosawa: "Re: [PATCH] documentation: Fix two-CPU control-dependency example"
Next in thread: Boqun Feng: "Re: [PATCH] documentation: Fix two-CPU control-dependency example"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]