Re: [RFC][PATCH 1/3] locking: Introduce smp_acquire__after_ctrl_dep

From: Paul E. McKenney
Date: Wed May 25 2016 - 10:29:44 EST


On Wed, May 25, 2016 at 01:39:30PM +0800, Boqun Feng wrote:
> On Tue, May 24, 2016 at 09:53:29PM -0700, Paul E. McKenney wrote:
> > On Tue, May 24, 2016 at 11:01:21PM -0400, Waiman Long wrote:
> > > On 05/24/2016 10:27 AM, Peter Zijlstra wrote:
> > > >Introduce smp_acquire__after_ctrl_dep(), this construct is not
> > > >uncommen, but the lack of this barrier is.
> > > >
> > > >Signed-off-by: Peter Zijlstra (Intel)<peterz@xxxxxxxxxxxxx>
> > > >---
> > > > include/linux/compiler.h | 14 ++++++++++----
> > > > ipc/sem.c | 14 ++------------
> > > > 2 files changed, 12 insertions(+), 16 deletions(-)
> > > >
> > > >--- a/include/linux/compiler.h
> > > >+++ b/include/linux/compiler.h
> > > >@@ -305,20 +305,26 @@ static __always_inline void __write_once
> > > > })
> > > >
> > > > /**
> > > >+ * smp_acquire__after_ctrl_dep() - Provide ACQUIRE ordering after a control dependency
> > > >+ *
> > > >+ * A control dependency provides a LOAD->STORE order, the additional RMB
> > > >+ * provides LOAD->LOAD order, together they provide LOAD->{LOAD,STORE} order,
> > > >+ * aka. ACQUIRE.
> > > >+ */
> > > >+#define smp_acquire__after_ctrl_dep() smp_rmb()
> > > >+
> > > >+/**
> > > > * smp_cond_acquire() - Spin wait for cond with ACQUIRE ordering
> > > > * @cond: boolean expression to wait for
> > > > *
> > > > * Equivalent to using smp_load_acquire() on the condition variable but employs
> > > > * the control dependency of the wait to reduce the barrier on many platforms.
> > > > *
> > > >- * The control dependency provides a LOAD->STORE order, the additional RMB
> > > >- * provides LOAD->LOAD order, together they provide LOAD->{LOAD,STORE} order,
> > > >- * aka. ACQUIRE.
> > > > */
> > > > #define smp_cond_acquire(cond) do { \
> > > > while (!(cond)) \
> > > > cpu_relax(); \
> > > >- smp_rmb(); /* ctrl + rmb := acquire */ \
> > > >+ smp_acquire__after_ctrl_dep(); \
> > > > } while (0)
> > > >
> > > >
> > >
> > > I have a question about the claim that control dependence + rmb is
> > > equivalent to an acquire memory barrier. For example,
> > >
> > > S1: if (a)
> > > S2: b = 1;
> > > smp_rmb()
> > > S3: c = 2;
> > >
> > > Since c is independent of both a and b, is it possible that the cpu
> > > may reorder to execute store statement S3 first before S1 and S2?
> >
> > The CPUs I know of won't do, nor should the compiler, at least assuming
> > "a" (AKA "cond") includes READ_ONCE(). Ditto "b" and WRITE_ONCE().
> > Otherwise, the compiler could do quite a few "interesting" things,
> > especially if it knows the value of "b". For example, if the compiler
> > knows that b==1, without the volatile casts, the compiler could just
> > throw away both S1 and S2, eliminating any ordering. This can get
> > quite tricky -- see memory-barriers.txt for more mischief.
> >
> > The smp_rmb() is not needed in this example because S3 is a write, not
>
> but S3 needs to be an WRITE_ONCE(), right? IOW, the following code can
> result in reordering:
>
> S1: if (READ_ONCE(a))
> S2: WRITE_ONCE(b, 1);
>
> S3: c = 2; // this can be reordered before READ_ONCE(a)
>
> but if we change S3 to WRITE_ONCE(c, 2), the reordering can not happen
> for the CPUs you are aware of, right?

Yes, if you remove the smp_rmb(), you also need a WRITE_ONCE() for S3.

Even with the smp_rmb(), you have to be careful.

In general, if you don't tell the compiler otherwise, it is within its
rights to assume that nothing else is reading from or writing to the
variables in question. That means that it can split and fuse loads
and stores. Or keep some of the variables in registers, so that it
never loads and stores them. ;-)

Thanx, Paul

> Regards,
> Boqun
>
> > a read. Perhaps you meant something more like this:
> >
> > if (READ_ONCE(a))
> > WRITE_ONCE(b, 1);
> > smp_rmb();
> > r1 = READ_ONCE(c);
> >
> > This sequence would guarantee that "a" was read before "c".
> >
> > Thanx, Paul
> >