Re: [PATCH 2/2] [PATCH] sched: Add smp_rmb() in task rq locking cycles

From: Oleg Nesterov
Date: Thu Feb 19 2015 - 09:21:10 EST


On 02/18, Peter Zijlstra wrote:
>
> On Wed, Feb 18, 2015 at 08:14:01PM +0100, Manfred Spraul wrote:
>
> > >spinlock_t local, global;
> > >bool force_global;

Yes, force_global (sma->complex_count) adds more complications, but
I think we can ignoire it in this discussion.

> > >bool my_lock(bool try_local)
> > >{
> > > if (try_local) {
> > > spin_lock(&local);
> > > if (!spin_is_locked(&global)) {
> > > if (!force_global) {
> > > return true;
> > > }
> > > }
> > > spin_unlock(&local);
> > >
> > >
> > > spin_lock(&global);
> > > spin_unlock_wait(&local);
> > > return false;
> > > }
> > >
> > > void my_unlock(bool drop_local)
> > > {
> > > if (drop_local)
> > > spin_unlock(&local);
> > > else
> > > spin_unlock(&global);
> > > }
> > >}
>
> > >Another question is do we need a barrier AFTER spin_unlock_wait(). I do not
> > >know what ipc/sem.c actually needs, but in general (I think) this does need
> > >mb(). Otherwise my_lock / my_unlock itself does not have the proper acq/rel
> > >semantics. For example, my_lock(false) can miss the changes which were done
> > >under my_lock(true).
>
> > How could that happen?
> > I thought that
> > thread A:
> > protected_var = 1234;
> > spin_unlock(&lock_a)
> >
> > thread B:
> > spin_lock(&lock_b)
> > if (protected_var)
>
> > is safe. i.e, there is no need that acquire and releases is done on the same pointer.
>
> Well, just those four statements can of course be executed like:
>
> CPU0 CPU1
>
> spin_lock(&b)
> if (prot_var)
>
> prot_var = 1;
> spin_unlock(&a);
>
> And you would see the old var. Lock a and b are completely independent
> here.
>
> Now of course the local/global thing in sysvsem is more complex.
>
> As to what Oleg meant:
>
> X := 0
>
> CPU0 CPU1
>
> spin_lock(&global);
> spin_lock(&local);
> X = 1;
> spin_unlock(&local);
> spin_unlock_wait(&local);
>
> assert(X == 1); /* BOOM */
>
> that assert can trigger, because spin_unlock_wait() are reads, the read
> of X can be lifted over and above, before the assignment of X on CPU1.
>
> Again, the sysvsem code is slightly more complex, but I think Oleg is
> right, there is no guarantee you'll observe the full critical section of
> sem->lock if sem_lock() takes the slow path and does sem_wait_array(),
> because of the above.

Yes, thanks Peter.

Or another artificial example,

int X = 0, Y = 0;

void func(void)
{
bool xxx = my_lock(rand());

BUG_ON(X != Y);

++X; ++Y;

my_unlock(xxx);
}

If func() can race with itself it can hit BUG_ON() above unless my_lock()
has the barriers after spin_unlock_wait() and spin_is_locked().

We need the full barrier to serialize STORE's as well, but probably we can
rely on control dependancy and thus we only need rmb().

And note that sem_lock() already has rmb() after spin_is_locked() to ensure
that we can't miss ->complex_count != 0 which can be changed under sem_perm.lock
("global" lock in the pseudo code above). This is correct, but this equally
applies to any other change under "global" lock, we can't miss it only because
we have rmb().

Amd the same is true for spin_unlock_wait() in the "slow" path.

Again, again, I do not know, perhaps sem.c is fine. For example, perhaps
sem_perm.lock doesn't need to fully serialize with sem_base[sem_num].lock.
But this is not obvious, and spin_unlock_wait() without a barrier looks
suspicious, at least this needs a comment imo.

Especially because it looks as if sem_base[sem_num].lock can't even fully
serialize with itself, sem_lock(nsops => -1) on the 3rd CPU can force one
of the lockers to switch to the "global" lock.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/