21.08.2013, 07:01, "Waiman Long"<waiman.long@xxxxxx>:On 08/20/2013 11:31 AM, Alexander Fyodorov wrote:Linux kernel does not support architectures without cache coherency, and while using memory barriers just for flushing write buffers ASAP on cache-coherent processors might benefit performance on some architectures it will hurt performance on others. So it must not be done in architecture-independent code.Isn't a race possible if another thread acquires the spinlock in the windowThe purpose of smp_wmb() is to make sure that content in the cache will
between setting lock->locked to 0 and issuing smp_wmb()? Writes from
the critical section from our thread might be delayed behind the write to
lock->locked if the corresponding cache bank is busy.
be flushed out to the memory in case the cache coherency protocol cannot
guarantee a single view of memory content on all processor.
In otherNo, I was talking about the window between freeing lock and issuing smp_wmb(). What I meant is:
word, smp_wmb() is used to make other processors see that the lock has
been freed ASAP. If another processor see that before smp_wmb(), it will
be even better as the latency is reduced. As the lock holder is the only
one that can release the lock, there is no race condition here.
1) A = 0
2) CPU0 locks the spinlock protecting A.
3) CPU0 writes 1 to A, but the write gets stalled because the corresponding cache bank is busy.
4) CPU0 calls spin_unlock() and sets lock->locked to 0.
If CPU1 does a spin_lock() right now, it will succeed (since lock->locked == 0). But the write to A hasn't reached cache yet, so CPU1 will see A == 0.
More examples on this are in Documentation/memory-barriers.txt
That is a legitimate question. I don't think it is a problem on x86 asIt does because "lock" prefix implies a full memory barrier.
the x86 spinlock code doesn't do a full mb() in the lock and unlock
paths.
The smp_mb() will be conditionalized depending on the ARCH_QSPINLOCKDo you mean because of inline? That shouldn't be a problem because smp_mb() prohibits compiler from doing any optimizations across the barrier (thanks to the "volatile" keyword).
setting. The smp_wmb() may not be needed, but a compiler barrier should
still be there.
More on this in Documentation/memory-barriers.txt
If compiler optimized memory access away it would be a bug. And I'm not so sure about overhead... For example, on some VLIW architectures ACCESS_ONCE() might prohibit compiler from mixing other instructions to the unlock.Also I don't understand why there are so many uses of ACCESS_ONCE()All the lock/unlock code can be inlined and we don't know what the
macro. It does not guarantee memory ordering with regard to other CPUs,
so probably most of the uses can be removed (with exception of
lock_is_contended(), where it prohibits gcc from optimizing the loop away).
compiler will do to optimize code. The ACCESS_ONCE() macro is used to
make sure that the compiler won't optimize away the actual fetch or
write of the memory. Even if the compiler won't optimize away the memory
access, adding the ACCESS_ONCE() macro won't have any extra overhead. So
a more liberal use of it won't hurt performance.