Re: wake_up_process implied memory barrier clarification

From: Boqun Feng
Date: Sat Aug 29 2015 - 05:26:22 EST


Hi Oleg

On Fri, Aug 28, 2015 at 06:06:37PM +0200, Oleg Nesterov wrote:
> On 08/28, Michal Hocko wrote:
> >
> > On Thu 27-08-15 20:26:54, Oleg Nesterov wrote:
> > > On 08/27, Michal Hocko wrote:
> > > >
> > > > --- a/Documentation/memory-barriers.txt
> > > > +++ b/Documentation/memory-barriers.txt
> > > > @@ -2031,6 +2031,9 @@ something up. The barrier occurs before the task state is cleared, and so sits
> > > > <general barrier> STORE current->state
> > > > LOAD event_indicated
> > > >
> > > > +Please note that wake_up_process is an exception here because it implies
> > > > +the write memory barrier unconditionally.
> > > > +
> > >
> > > I simply can't understand (can't even parse) this part of memory-barriers.txt.
> >
> > Do you mean the added text or the example above it?
>
> Both ;)
>
> but note that "load from X might see 0" is true of course, and in this

By this, I think you actually means the example below the added text,
i.e. the example for "to repeat..", right?

I think that's not a good example, and actually that example explains
that the barriers in -wait_event()- rather than in -wake_up()- are
conditional.

I wrote a patch to make things more clear, hope that helps.

(Add Paul and Jonathan to CCed)

Regards,
Boqun

> sense wake_up_process() is not exception:
>
> X = 1;
> wmb(); // unless I am totally confused this just adds more confusion
> Y = 1;
> wake_up_process(TASK);
>
> vs TASK doing
>
> for (;;) {
> set_current_state(...);
> if (Y)
> break;
> schedule();
> }
>
> BUG_ON(X == 0)
>
> is not correct, it can actually can hit the BUG_ON() above. However, if
> wake_up_process() actually wakes a sleeping TASK up, then it should also
> see X = 1. Even without wmb(), even if we do
>
> Y = 1;
> X = 1;
> wake_up_process(TASK);
>

----------->8
Subject: Documentation: call out conditional barriers of wait_*() and wake_up*()

The memory barriers in some sleep and wakeup functions are conditional,
there are several situations that there is no barriers:

1. If wait_event() and co. actually don't prepare to sleep, there
may be no barrier in them.

2. No matter whether a sleep occurs or not, there may be no barrier
between a successful wait-condition checking(the result of which
is true) in wait_event() and the following instructions.

3. If wake_up() and co. actually wake up no one, there may be no
write barrier in them.

However, the current version of memory-barriers.txt doesn't call these
out. Further more, the example which wanted to explain that write
barriers in wake_up*() are conditional fails its job. To see that,
consider a similar example:

CPU 1 CPU 2
=============================== ===============================
X = 1;
smp_mb();
Y = 1; wait_event(wq, Y == 1);
load from Y sees 1, no memory barrier
smp_wmb();
wake_up();
load from X might see 0

CPU 2's load from X might still be 0 even if we add a write barrier
explicitly, which means that even if wake_up() guarantees a write
barrier, the original example still doesn't have the desired order
guarantee. In fact, the original example can explains the conditionality
of barriers in wait_*() rather than wake_up*().

This patch makes things clear, by calling out that the barriers in
wait_*() and wake_up*() are conditional and giving exact examples to
explain that.

Signed-off-by: Boqun Feng <boqun.feng@xxxxxxxxx>
---

Documentation/memory-barriers.txt | 81 +++++++++++++++++++++++++++++++--------
1 file changed, 65 insertions(+), 16 deletions(-)

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index eafa6a5..df69841 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1975,7 +1975,8 @@ set_current_state() may be wrapped by:

which therefore also imply a general memory barrier after setting the state.
The whole sequence above is available in various canned forms, all of which
-interpolate the memory barrier in the right place:
+imply a general barrier if and only if a sleep is at least about to happen,
+i.e. prepare_to_wait*() is called.

wait_event();
wait_event_interruptible();
@@ -1986,6 +1987,9 @@ interpolate the memory barrier in the right place:
wait_on_bit();
wait_on_bit_lock();

+Further more, no barrier is guaranteed after the successful wait condition
+checkings, whose results are true, in wait_*() and before the instructions
+following wait_*().

Secondly, code that performs a wake up normally follows something like this:

@@ -2009,21 +2013,6 @@ between the STORE to indicate the event and the STORE to set TASK_RUNNING:
<general barrier> STORE current->state
LOAD event_indicated

-To repeat, this write memory barrier is present if and only if something
-is actually awakened. To see this, consider the following sequence of
-events, where X and Y are both initially zero:
-
- CPU 1 CPU 2
- =============================== ===============================
- X = 1; STORE event_indicated
- smp_mb(); wake_up();
- Y = 1; wait_event(wq, Y == 1);
- wake_up(); load from Y sees 1, no memory barrier
- load from X might see 0
-
-In contrast, if a wakeup does occur, CPU 2's load from X would be guaranteed
-to see 1.
-
The available waker functions include:

complete();
@@ -2042,6 +2031,66 @@ The available waker functions include:
wake_up_poll();
wake_up_process();

+To repeat, barriers in wait_event(), wake_up() and co. are conditional, meaning
+they are present if and only if a sleep or a wakeup actually occurs. To see
+this, consider the following three examples.
+
+The first example is for the conditional barriers in wait_*(), say X and Y
+are both initially zero:
+
+ CPU 1 CPU 2
+ =============================== ===============================
+ X = 1;
+ smp_mb();
+ wait_event(wq, Y == 1); Y = 1;
+ load from Y sees 1 wake_up();
+ <no memory barrier>
+ load from X might see 0
+
+And even if a sleep and a wakeup really occurs, there might be no barrier
+between the last load from Y(which the makes wait condition checking succeed)
+and load from X, so that load from X might still see 0. The second example shows
+that, the code running on CPU 1 & 2 is the same with the first example, however
+the sequence of events are different, say X, Y and Z are all initially zero,
+and another task may be waiting in wait_event(wq, Z == 1) on CPU 4:
+
+ CPU 1 CPU 2 CPU 3
+ =============================== =============================== ===============================
+ wait_event(wq, Y == 1); Z = 1;
+ load from Y sees 0
+ prepare_to_wait_event();
+ <general barrier>
+ schedule();
+ <general barrier>
+ wake_up();
+ prepare_to_wait_event();
+ <general barrier>
+ X = 1;
+ smp_mb();
+ load from Y sees 1 Y = 1;
+ <no memory barrier> wake_up();
+ load from X might see 0
+
+So that, to guarantee that CPU 1's load from X is 1 in the two examples above,
+a read barrier must be added after wait_event() in the code running on CPU 1.
+
+The third example is for the conditional write barriers in wake_up*(), say
+X, Y and Z are all initially zero:
+
+ CPU 1 CPU 2
+ =============================== ===============================
+ X = 1;
+ wait_event(wq, Y == 1); Y = 1;
+ load from Y sees 1 wake_up();
+ <no memory barrier>
+ Z = 1;
+ load from Z sees 1
+ smp_rmb();
+ load from X might see 0
+
+In contrast, if a wakeup does occur, CPU 1's load from X would be guaranteed
+to see 1. To guarantee that CPU 1's load from X is 1, a write barrier must be
+added between store to X and store to Z in the code running on CPU 2.

[!] Note that the memory barriers implied by the sleeper and the waker do _not_
order multiple stores before the wake-up with respect to loads of those stored

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/