On Tue, 2013-10-01 at 21:25 -0400, Waiman Long wrote:But the condition code may be checked after speculative execution?
If the lock and unlock functions are done right, there should be no
overlap of critical section. So it is job of the lock/unlock functions
to make sure that critical section code won't leak out. There should be
some kind of memory barrier at the beginning of the lock function and
the end of the unlock function.
The critical section also likely to have branches. The CPU may
speculatively execute code on the 2 branches, but one of them will be
discarded once the branch condition is known. Also
arch_mutex_cpu_relax() is a compiler barrier by itself. So we may not
need a barrier() after all. The while statement is a branch instruction,
any code after that can only be speculatively executed and cannot be
committed until the branch is done.
The condition may not be true during speculative execution and only
turns true when we check the condition, and take that branch?
The thing that bothers me is without memory barrier after the while
statement, we could speculatively execute before affirming the lock is
in acquired state. Then when we check the lock, the lock is set
to acquired state in the mean time.
We could be loading some memory entry *before*
the node->locked has been set true. I think a smp_rmb (if not a
smp_mb) should be set after the while statement.