Re: [PATCH] fix a race condition in cancelable mcs spinlocks

From: John David Anglin
Date: Mon Jun 02 2014 - 11:39:31 EST


On 6/2/2014 10:02 AM, Mikulas Patocka wrote:

On Mon, 2 Jun 2014, Mikulas Patocka wrote:


On Sun, 1 Jun 2014, John David Anglin wrote:

On 1-Jun-14, at 3:20 PM, Peter Zijlstra wrote:

If you write to some variable with ACCESS_ONCE and use cmpxchg or xchg at
the same time, you break it. ACCESS_ONCE doesn't take the hashed spinlock,
so, in this case, cmpxchg or xchg isn't really atomic at all.
And this is really the first place in the kernel that breaks like this?
I've been using xchg() and cmpxchg() without such consideration for
quite a while.
I believe Mikulas is correct. Even in a controlled situation where a
cmpxchg operation is used to implement pthread_spin_lock() in userspace,
we found recently that the lock must be released with a cmpxchg
operation and not a simple write on SMP systems. There is a race in the
cache operations or instruction ordering that's not present with the
ldcw instruction.

Dave
--
John David Anglin dave.anglin@xxxxxxxx
That is strange.

Spinlock with cmpxchg on lock and a single write on unlock should work,
assuming that cmpxchg doesn't write to the target address when it detects
mismatch (the cmpxchg in the kernel syscall page doesn't do it, it
nullifies the write instruction on mismatch).

Do you have some code that reproduces this misbehavior?
There is a pthread_spin_lock test in the kyotocabinet package that reproduces
this misbehavior. Essentially, it creates four threads which loop doing pthread_spin_lock(),
sched_yield() and then pthread_spin_unlock(). On SMP systems, the test hangs with
the pthread_spin_lock locked and no thread holding lock (i.e., unlock failed).

The pthread support uses the cmpxchg code in arch/parisc/kernel/syscall.S. This uses
"hashed" locks, etc, in a manner similar to the kernel code.


We really need to find out why does it behave this way:
- is PA-RISC really out of order? (we used to believe that it is in-order
and we have empty barrier instructions in the kernel). Does adding the
"SYNC" instruction before the write in pthread_spin_unlock fix it?
I tried "SYNC" instruction before write and after the cmpxchg operation both
with. In the cmpxchg operation, I also tried it with cache flush. I was trying to
simulated ldcw behavior.
- does the processor performs nullified writes unconditionally? Does
moving the write in the cmpxchg implementation from the nullified slot
to is own branch fix it?
I don't see how the processor can perform nullified writes unconditionally although that
might explain the observed symptom. Didn't try moving the cmpxchg write.

- does adding a dummy "ldcw" instruction to an unrelated address fix it?
Is it that "ldcw" has some magic barrier properties?
I had wondered about that. One can't use %r0 as the instruction target as the architecture
manual says that it may then be implemented as a normal load. "ldcw" definitely has some magic
cache and barrier properties. A normal store definitely works with it to reset the semaphore.
- and there is "stw,o" instruction that does ordered store according to
the specification, so we should test it too...
This doesn't help.

Currently, the Debian eglibc has a pthread_spin_unlock.diff patch that resolves the
kyotocabinet bug. See:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=725508


I think we need to perform these tests and maybe some more to find out
what really happened there...

BTW. in Debian 5 libc 2.7, pthread_spin_lock uses ldcw and
pthread_spin_unlock uses a single write (just like the kernel spinlock
implementation). In Debian-ports libc 2.18, both pthread_spin_lock and
pthread_spin_unlock call the kernel syscall page. What was the reason for
switching to a less efficient implementation?

Mikulas



Dave

--
John David Anglin dave.anglin@xxxxxxxx

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/