On Tue, Apr 28, 2015 at 12:58:55PM -0400, Chris Metcalf wrote:
On 04/28/2015 12:40 PM, Peter Zijlstra wrote:Does your arch have 16bit atomic load/stores ? If so, would something
On Tue, Apr 28, 2015 at 11:53:21AM -0400, Chris Metcalf wrote:We see substantial unfairness under load with a plain spinlock,
The reason we use two 32-bit fields on tilepro is that the only availableAnd you want a ticket lock as opposed to the test-and-set lock because
atomic instruction is tns (test and set), which sets a 32-bit "1" value
into the target memory and returns the old 32-bit value.
with 64 tiles starvation under contention is a real worry?
basically because nearer cores on the mesh network can exponentially
crowd out further cores. The ticket lock avoids that, though we
have to be careful to do backoff when checking the lock to avoid
DDoS in the mesh network.
like the below not make sense?
typedef struct {
union {
struct {
unsigned short head;
unsigned short tail;
};
unsigned int tickets;
};
unsigned int lock;
} arch_spinlock_t;
static inline void ___tns_lock(unsigned int *lock)
{
while (tns(lock))
cpu_relax();
}
static inline void ___tns_unlock(unsigned int *lock)
{
WRITE_ONCE(*lock, 0);
}
static inline void arch_spin_lock(arch_spinlock_t *lock)
{
unsigned short head, tail;
___tns_lock(&lock->lock); /* XXX does the TNS imply a ___sync? */
head = lock->head;
lock->head++;
___tns_unlock(&lock->lock);
while (READ_ONCE(lock->tail) != head)
cpu_relax();
}
static inline void arch_spin_unlock(arch_spinlock_t *lock)
{
/*
* can do with regular load/store because the lock owner
* is the only one going to do stores to the tail
*/
unsigned short tail = READ_ONCE(lock->tail);
smp_mb(); /* MB is stronger than RELEASE */
WRITE_ONCE(lock->tail, tail + 1);
}
static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
{
union {
struct {
unsigned short head;
unsigned short tail;
};
unsigned int tickets;
} x;
for (;;) {
x.tickets = READ_ONCE(lock->tickets);
if (x.head == x.tail)
break;
cpu_relax();
}
}