Hi Oleg,

my example was bad, let's continue with your example.

And: If sem_lock() needs another smp_xmb(), then we must add it:
Some apps do not have a user space hot path, i.e. it seems that on some setups, we have millions of calls per second.
If there is a race, then it will happen.

I've tried to merge your example:
> int X = 0, Y = 0;
> void func(void)
> {
> bool ll = rand();
> if (ll) {
> spin_lock(&local);
> if (!spin_is_locked(&global))
> goto done;
> spin_unlock(&local);
> }
> ll = false;
> spin_lock(&global);
> spin_unlock_wait(&local);
> done:
> smp_rmb(); <<<<<<<<<<<<<<<
> BUG_ON(X != Y);
> ++X; ++Y;
> if (ll)
> spin_unlock(&local);
> else
> spin_unlock(&global);
> }
I agree, we need the smp_rmb().
I'll write a patch.

We need the full barrier to serialize STORE's as well, but probably we can
rely on control dependancy and thus we only need rmb().
Do we need a full barrier or not?

I don't manage to create a proper line of reasoning.
