Re: [PATCH 2/2] [PATCH] sched: Add smp_rmb() in task rq locking cycles

From: Paul E. McKenney
Date: Sun Apr 26 2015 - 07:08:31 EST


On Fri, Feb 20, 2015 at 07:45:51PM +0100, Peter Zijlstra wrote:
> On Fri, Feb 20, 2015 at 07:28:16PM +0100, Manfred Spraul wrote:
>
> > >We need the full barrier to serialize STORE's as well, but probably we can
> > >rely on control dependancy and thus we only need rmb().
> > Do we need a full barrier or not?
> >
> > I don't manage to create a proper line of reasoning.
>
> I think I agree with Oleg in that we only need the smp_rmb(); of course
> that wants a somewhat elaborate comment to go along with it. How about
> something like so:
>
> spin_unlock_wait(&local);
> /*
> * The above spin_unlock_wait() forms a control dependency with
> * any following stores; because we must first observe the lock
> * unlocked and we cannot speculate stores.
> *
> * Subsequent loads however can easily pass through the loads
> * represented by spin_unlock_wait() and therefore we need the
> * read barrier.
> *
> * This together is stronger than ACQUIRE for @local and
> * therefore we will observe the complete prior critical section
> * of @local.
> */
> smp_rmb();
>
> The obvious alternative is using spin_unlock_wait() with an
> smp_load_acquire(), but that might be more expensive on some archs due
> to repeated issuing of memory barriers.

Please accept my apologies for the late reply. And thank you to Peter
for prodding me (again!) to think more carefully about this. Because we
have a bit of a comedy of errors at the moment. I will start with
mine, namely that we need to support DEC Alpha, which means we cannot
use ACCESS_ONCE() or READ_ONCE() to head a control-dependency chain.
Alpha needs an mb instruction for control dependencies, just as it does
for data dependencies.

I have the following (untested, probably does not even build) patch
queued to introduce READ_ONCE_CTRL() to head control depedencies, making
them safe for DEC Alpha. And, just as important, also making them more
self-documenting.

But the fun doesn't end there! Some architectures have "interesting"
implementations of spin_unlock_wait() and arch_spin_is_locked(). I will
deal with these in a follow-up email.

In the meantime, thoughts on the patch below?

Thanx, Paul

------------------------------------------------------------------------

smp: Make control dependencies work on Alpha, improve documentation

The current formulation of control dependencies fails on DEC Alpha, which
does not respect dependencies of any kind unless an explicit memory is
provided. This means that the current fomulation of control dependencies
fails on Alpha. This commit therefore creates a READ_ONCE_CTRL() that
has the same overhead on non-Alpha systems, but causes Alpha to produce
the needed ordering. This commit also applies READ_ONCE_CTRL() to the
one known use of control dependencies.

Use of READ_ONCE_CTRL() also has the beneficial effect of adding a bit
of self-documentation to control dependencies.

Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 52c320e3f107..16897552dc1c 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -617,16 +617,16 @@ case what's actually required is:
However, stores are not speculated. This means that ordering -is- provided
for load-store control dependencies, as in the following example:

- q = ACCESS_ONCE(a);
+ q = READ_ONCE_CTRL(a);
if (q) {
ACCESS_ONCE(b) = p;
}

-Control dependencies pair normally with other types of barriers.
-That said, please note that ACCESS_ONCE() is not optional! Without the
-ACCESS_ONCE(), might combine the load from 'a' with other loads from
-'a', and the store to 'b' with other stores to 'b', with possible highly
-counterintuitive effects on ordering.
+Control dependencies pair normally with other types of barriers. That
+said, please note that READ_ONCE_CTRL() is not optional! Without the
+READ_ONCE_CTRL(), the compiler might combine the load from 'a' with
+other loads from 'a', and the store to 'b' with other stores to 'b',
+with possible highly counterintuitive effects on ordering.

Worse yet, if the compiler is able to prove (say) that the value of
variable 'a' is always non-zero, it would be well within its rights
@@ -636,12 +636,15 @@ as follows:
q = a;
b = p; /* BUG: Compiler and CPU can both reorder!!! */

-So don't leave out the ACCESS_ONCE().
+Finally, the READ_ONCE_CTRL() includes an smp_read_barrier_depends()
+that DEC Alpha needs in order to respect control depedencies.
+
+So don't leave out the READ_ONCE_CTRL().

It is tempting to try to enforce ordering on identical stores on both
branches of the "if" statement as follows:

- q = ACCESS_ONCE(a);
+ q = READ_ONCE_CTRL(a);
if (q) {
barrier();
ACCESS_ONCE(b) = p;
@@ -655,7 +658,7 @@ branches of the "if" statement as follows:
Unfortunately, current compilers will transform this as follows at high
optimization levels:

- q = ACCESS_ONCE(a);
+ q = READ_ONCE_CTRL(a);
barrier();
ACCESS_ONCE(b) = p; /* BUG: No ordering vs. load from a!!! */
if (q) {
@@ -685,7 +688,7 @@ memory barriers, for example, smp_store_release():
In contrast, without explicit memory barriers, two-legged-if control
ordering is guaranteed only when the stores differ, for example:

- q = ACCESS_ONCE(a);
+ q = READ_ONCE_CTRL(a);
if (q) {
ACCESS_ONCE(b) = p;
do_something();
@@ -694,14 +697,14 @@ ordering is guaranteed only when the stores differ, for example:
do_something_else();
}

-The initial ACCESS_ONCE() is still required to prevent the compiler from
-proving the value of 'a'.
+The initial READ_ONCE_CTRL() is still required to prevent the compiler
+from proving the value of 'a'.

In addition, you need to be careful what you do with the local variable 'q',
otherwise the compiler might be able to guess the value and again remove
the needed conditional. For example:

- q = ACCESS_ONCE(a);
+ q = READ_ONCE_CTRL(a);
if (q % MAX) {
ACCESS_ONCE(b) = p;
do_something();
@@ -714,7 +717,7 @@ If MAX is defined to be 1, then the compiler knows that (q % MAX) is
equal to zero, in which case the compiler is within its rights to
transform the above code into the following:

- q = ACCESS_ONCE(a);
+ q = READ_ONCE_CTRL(a);
ACCESS_ONCE(b) = p;
do_something_else();

@@ -725,7 +728,7 @@ is gone, and the barrier won't bring it back. Therefore, if you are
relying on this ordering, you should make sure that MAX is greater than
one, perhaps as follows:

- q = ACCESS_ONCE(a);
+ q = READ_ONCE_CTRL(a);
BUILD_BUG_ON(MAX <= 1); /* Order load from a with store to b. */
if (q % MAX) {
ACCESS_ONCE(b) = p;
@@ -742,14 +745,15 @@ of the 'if' statement.
You must also be careful not to rely too much on boolean short-circuit
evaluation. Consider this example:

- q = ACCESS_ONCE(a);
+ q = READ_ONCE_CTRL(a);
if (a || 1 > 0)
ACCESS_ONCE(b) = 1;

-Because the second condition is always true, the compiler can transform
-this example as following, defeating control dependency:
+Because the first condition cannot fault and the second condition is
+always true, the compiler can transform this example as following,
+defeating control dependency:

- q = ACCESS_ONCE(a);
+ q = READ_ONCE_CTRL(a);
ACCESS_ONCE(b) = 1;

This example underscores the need to ensure that the compiler cannot
@@ -762,8 +766,8 @@ demonstrated by two related examples, with the initial values of
x and y both being zero:

CPU 0 CPU 1
- ===================== =====================
- r1 = ACCESS_ONCE(x); r2 = ACCESS_ONCE(y);
+ ======================= =======================
+ r1 = READ_ONCE_CTRL(x); r2 = READ_ONCE_CTRL(y);
if (r1 > 0) if (r2 > 0)
ACCESS_ONCE(y) = 1; ACCESS_ONCE(x) = 1;

@@ -783,7 +787,8 @@ But because control dependencies do -not- provide transitivity, the above
assertion can fail after the combined three-CPU example completes. If you
need the three-CPU example to provide ordering, you will need smp_mb()
between the loads and stores in the CPU 0 and CPU 1 code fragments,
-that is, just before or just after the "if" statements.
+that is, just before or just after the "if" statements. Furthermore,
+the original two-CPU example is very fragile and should be avoided.

These two examples are the LB and WWC litmus tests from this paper:
http://www.cl.cam.ac.uk/users/pes20/ppc-supplemental/test6.pdf and this
@@ -791,6 +796,12 @@ site: https://www.cl.cam.ac.uk/~pes20/ppcmem/index.html.

In summary:

+ (*) Control dependencies must be headed by READ_ONCE_CTRL().
+ Or, as a much less preferable alternative, interpose
+ be headed by READ_ONCE() or an ACCESS_ONCE() read and must
+ have smp_read_barrier_depends() between this read and the
+ control-dependent write.
+
(*) Control dependencies can order prior loads against later stores.
However, they do -not- guarantee any other sort of ordering:
Not prior loads against later loads, nor prior stores against
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 1b45e4a0519b..a57eacde2b84 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -264,6 +264,22 @@ static __always_inline void __write_once_size(volatile void *p, void *res, int s
#define WRITE_ONCE(x, val) \
({ typeof(x) __val = (val); __write_once_size(&(x), &__val, sizeof(__val)); __val; })

+/**
+ * READ_ONCE_CTRL - Read a value heading a control dependency
+ * @x: The value to be read, heading the control dependency
+ *
+ * Control dependencies are tricky. See Documentation/memory-barriers.txt
+ * for important information on how to use them. Note that in many cases,
+ * use of smp_load_acquire() will be much simpler. Control dependencies
+ * should be avoided except on the hottest of hotpaths.
+ */
+#define READ_ONCE_CTRL(x) \
+({ \
+ typeof(x) __val = READ_ONCE(x); \
+ smp_read_barrier_depends(); /* Enforce control dependency. */ \
+ __val; \
+})
+
#endif /* __KERNEL__ */

#endif /* __ASSEMBLY__ */
diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index eadb95ce7aac..67548d5de4cb 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -141,7 +141,7 @@ int perf_output_begin(struct perf_output_handle *handle,
perf_output_get_handle(handle);

do {
- tail = ACCESS_ONCE(rb->user_page->data_tail);
+ tail = READ_ONCE_CTRL(rb->user_page->data_tail);
offset = head = local_read(&rb->head);
if (!rb->overwrite &&
unlikely(CIRC_SPACE(head, tail, perf_data_size(rb)) < size))

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/