Re: [PATCH tip/core/rcu 1/9] rcu: Provide GP ordering in face of migrations and delays

From: Paul E. McKenney
Date: Fri Oct 06 2017 - 15:18:42 EST


On Fri, Oct 06, 2017 at 11:07:23AM +0200, Peter Zijlstra wrote:
> On Thu, Oct 05, 2017 at 11:22:04AM -0700, Paul E. McKenney wrote:
> > Hmmm... Here is what I was worried about:
> >
> > C C-PaulEMcKenney-W+RWC4+2017-10-05
> >
> > {
> > }
> >
> > P0(int *a, int *x)
> > {
> > WRITE_ONCE(*a, 1);
> > smp_mb(); /* Lock acquisition for rcu_node ->lock. */
> > WRITE_ONCE(*x, 1);
> > }
> >
> > P1(int *x, int *y, spinlock_t *l)
> > {
> > r3 = READ_ONCE(*x);
> > smp_mb(); /* Lock acquisition for rcu_node ->lock. */
> > spin_lock(l); /* Locking in complete(). */
> > WRITE_ONCE(*y, 1);
> > spin_unlock(l);
> > }
> >
> > P2(int *y, int *b, spinlock_t *l)
> > {
> > spin_lock(l); /* Locking in wait_for_completion. */
> > r4 = READ_ONCE(*y);
> > spin_unlock(l);
> > r1 = READ_ONCE(*b);
> > }
> >
> > P3(int *b, int *a)
> > {
> > WRITE_ONCE(*b, 1);
> > smp_mb();
> > r2 = READ_ONCE(*a);
> > }
> >
> > exists (1:r3=1 /\ 2:r4=1 /\ 2:r1=0 /\ 3:r2=0)
>
> /me goes and install this herd thing again.. I'm sure I had it running
> _somewhere_.. A well.
>
> C C-PaulEMcKenney-W+RWC4+2017-10-05
>
> {
> }
>
> P0(int *a, int *x)
> {
> WRITE_ONCE(*a, 1);
> smp_mb(); /* Lock acquisition for rcu_node ->lock. */
> WRITE_ONCE(*x, 1);
> }
>
> P1(int *x, int *y)
> {
> r3 = READ_ONCE(*x);
> smp_mb(); /* Lock acquisition for rcu_node ->lock. */
> smp_store_release(y, 1);
> }
>
> P2(int *y, int *b)
> {
> r4 = smp_load_acquire(y);
> r1 = READ_ONCE(*b);
> }
>
> P3(int *b, int *a)
> {
> WRITE_ONCE(*b, 1);
> smp_mb();
> r2 = READ_ONCE(*a);
> }
>
> exists (1:r3=1 /\ 2:r4=1 /\ 2:r1=0 /\ 3:r2=0)
>
>
> Is what I was thinking of, I think that is the minimal ordering
> complete()/wait_for_completion() need to provide.

OK, I will bite... What do the smp_store_release() and the
smp_load_acquire() correspond to? I see just plain locking in
wait_for_completion() and complete().

> (also, that r# numbering confuses the hell out of me, its not related to
> P nor to the variables)

Yeah, it is random, sorry!!!

> Test C-PaulEMcKenney-W+RWC4+2017-10-05 Allowed
> States 15
> 1:r3=0; 2:r1=0; 2:r4=0; 3:r2=0;
> 1:r3=0; 2:r1=0; 2:r4=0; 3:r2=1;
> 1:r3=0; 2:r1=0; 2:r4=1; 3:r2=0;
> 1:r3=0; 2:r1=0; 2:r4=1; 3:r2=1;
> 1:r3=0; 2:r1=1; 2:r4=0; 3:r2=0;
> 1:r3=0; 2:r1=1; 2:r4=0; 3:r2=1;
> 1:r3=0; 2:r1=1; 2:r4=1; 3:r2=0;
> 1:r3=0; 2:r1=1; 2:r4=1; 3:r2=1;
> 1:r3=1; 2:r1=0; 2:r4=0; 3:r2=0;
> 1:r3=1; 2:r1=0; 2:r4=0; 3:r2=1;
> 1:r3=1; 2:r1=0; 2:r4=1; 3:r2=1;
> 1:r3=1; 2:r1=1; 2:r4=0; 3:r2=0;
> 1:r3=1; 2:r1=1; 2:r4=0; 3:r2=1;
> 1:r3=1; 2:r1=1; 2:r4=1; 3:r2=0;
> 1:r3=1; 2:r1=1; 2:r4=1; 3:r2=1;
> No
> Witnesses
> Positive: 0 Negative: 15
> Condition exists (1:r3=1 /\ 2:r4=1 /\ 2:r1=0 /\ 3:r2=0)
> Observation C-PaulEMcKenney-W+RWC4+2017-10-05 Never 0 15
> Time C-PaulEMcKenney-W+RWC4+2017-10-05 0.04
> Hash=f7f8ad6eab33e90718a394bcb021557d

But yes, looking closer, this corresponds to the rule of thumb about
non-rf relations and full memory barriers. We have two non-rf relations
(P2->P3 and P3->P0), so we need two full barriers, one each between the
non-rf relations.

So I dropped that patch yesterday. The main thing I was missing was
that there is no ordering-free fastpath in wait_for_completion() and
complete(): Each unconditionally acquires the lock. So the smp_mb()
that I was trying to add doesn't need to be there.

Thanx, Paul