Re: Linux spin_unlock debate (fwd)

From: Oliver Xymoron (oxymoron@waste.org)
Date: Mon Apr 24 2000 - 21:41:18 EST


Not only have I now gotten positive test results for PPro steppings back
to 1, I have confirmation from Intel folks. Thanks, Andy.

--
 "Love the dolphins," she advised him. "Write by W.A.S.T.E.." 

---------- Forwarded message ---------- Date: Mon, 24 Apr 2000 16:20:13 -0500 From: Andy Glew <glew@cs.wisc.edu> To: Oliver Xymoron <oxymoron@waste.org> Cc: "Neiger, Gil" <gil.neiger@intel.com> Subject: Re: Linux spin_unlock debate

> Linus has the impression that movb is unsafe for unlocking, at least on > some early P6 chips, even though it's a huge performance win. He cited you > as a source. I've read through all the PPro errata and can't find anything > that would cause a problem. Further, I've built a test program (below) for > SMP systems that ensures the ordering of writes is preserved across the > bus and it's worked on all the systems people have tested it on. Can you > think of any reason why the below should be unsafe (triggering the printf > in racefunc()) or are there other Intel people I should talk to about it?

Use MOVB already!!!! Tell Linus I told you so!!!! (I've passed this on to about a dozen LINUX people. Send me Linus' email address, and I'll tell him myself. Forward this on, whatever.)

I know why this misunderstanding happened:

Back in 90/91, Intel had a "Memory Ordering Task Force", whose recommendation was that Intel adopt a weakly ordered memory consistency model. To make this work, you have to be able to identify lock releases. Lacking a special "RELEASE LOCK" instruction - on the i486 you could apply the LOCK prefix to an ordinary write, not just a read-modify-write -- this task force said that locks should be released by a locked read-modify-write. Like, and XCHG that wrote zero.

That's still official Intel policy.

However, the P6 processor decided to go for "speculative processir consistency" - we snoop the bus appropriately, and are *not* weakly consistent. Indeed. it would be close to impossible to build a weakly consistent P6 system, without having to require special external hardware.

So, on P6 (and Willamette, I believe) it is correct to use an ordinary write to release the lock.

Now, there *was* a bug on early P6 silicon in this regard. But it would not bite you so long as you used a locked read-modify-write to acquire the lock.

So: if you are willing to write code that should work correctly on every x86 multiprocessor to date (except for some of the earliest i386 and i486 MPs that were weakly ordered) you can use MOVB to release the lock.

Now, I'm in absolutely no position to say that, but I think that I can reasonably say that, if Intel ever builds a MP-capable system that is weakly ordered, there will be a CPUID bit indicating it. In fact, I think it;'s time that we defined what it is - one of the bits that currently reads as 0 - so that you can write forward looking code. I'll see if I can get that defined for you. For now, check that CPUID indicates a P6 or a earlier, or whatever Willamette turns out to be.

Caveat: although Intel's "glueless MP" works as I have described, it is always possible for a system vendor to break the consistency model, e.g. by reordering stores in external inter-bus bridges. So, don't abandon the conservative code - just don't make it the default, since 90% of the x86 MP systems are processor consistent.

Caveat #2: IA64 Merced systems are not, IIRC, processor consistent. (Fortunately, they do define an efficient lock release.) I'm CC'ing Gil Neiger on this - he took over the role of :"memory model maven" when I left Intel, knows what the IA64 memory model is, and will undoubtedly correct any errors in my email.

BOTTOM LINE: use MOVB already!!!!

> > > /* > > movb.c > > Compile with: > > cc -O2 -o movb movb.c -lpthread > > undefine MOVB below to test the old spin_unlock variant > > */ > > #include <stdio.h> > #include <pthread.h> > #include <assert.h> > > #define MOVB > > #define mb() __asm__ __volatile__ ("lock; addl $0,0(%%esp)": : :"memory") > > typedef struct { unsigned long a[100]; } __dummy_lock_t; > #define __dummy_lock(lock) (*(__dummy_lock_t *)(lock)) > > typedef struct { > volatile unsigned int lock; > #if SPINLOCK_DEBUG > unsigned magic; > #endif > } spinlock_t; > > #if SPINLOCK_DEBUG > #define SPINLOCK_MAGIC_INIT , SPINLOCK_MAGIC > #else > #define SPINLOCK_MAGIC_INIT /* */ > #endif > > #define SPIN_LOCK_UNLOCKED (spinlock_t) { 0 SPINLOCK_MAGIC_INIT } > > #define spin_lock_init(x) do { *(x) = SPIN_LOCK_UNLOCKED; } while(0) > > #define spin_lock_string \ > "\n1:\t" \ > "lock ; btsl $0,%0\n\t" \ > "jc 2f\n" \ > ".section .text.lock,\"ax\"\n" \ > "2:\t" \ > "testb $1,%0\n\t" \ > "rep;nop\n\t" \ > "jne 2b\n\t" \ > "jmp 1b\n" \ > ".previous" > > #ifdef MOVB > #define spin_unlock_string \ > "movb $0,%0" > #else > #define spin_unlock_string \ > "lock ; btrl $0,%0" > #endif > > extern inline void spin_lock(spinlock_t *lock) > { > #if SPINLOCK_DEBUG > __label__ here; > here: > if (lock->magic != SPINLOCK_MAGIC) { > printk("eip: %p\n", &&here); > BUG(); > } > #endif > __asm__ __volatile__( > spin_lock_string > :"=m" (__dummy_lock(lock))); > } > > extern inline void spin_unlock(spinlock_t *lock) > { > #if SPINLOCK_DEBUG > if (lock->magic != SPINLOCK_MAGIC) > BUG(); > if (!lock->lock) > BUG(); > #endif > __asm__ __volatile__( > spin_unlock_string > :"=m" (__dummy_lock(lock))); > } > > typedef void *threadfunc(void *); > > void start_thread(threadfunc *f) > { > pthread_t thread; > int res; > > res = pthread_create(&thread,NULL,f,NULL); > > if(res != 0) > assert(0); > } > > static spinlock_t testlock = SPIN_LOCK_UNLOCKED; > volatile int state=0; > > void racefunc() > { > int i, j; > volatile int delay=50; > > spin_lock(&testlock); > > i=state; > if(i) { > printf("state was %d!\n", i); > exit(-1); > } > > /* force change of state on bus */ > state=1; > mb(); > for(j=0;j<delay;j++) > ; > > /* swap next two lines to demonstrate race */ > state=0; > spin_unlock(&testlock); > } > > void * cpu1(void *param) > { > int i, j; > volatile int delay=1; > > for(i=0; i<5000000; i++) { > > for(j=0;j<delay;j++) > ; > > racefunc(); > > if((i%5000)==0) { > printf("delay %d: ok\n",delay); > delay++; > } > } > > printf("thread %d finished.\n",(int)param); > exit(0); > } > > void* cpu2(void* param) > { > volatile int delay2 = 300; > int i=0,j; > > for(;;) { > for(j=0;j<delay2;j++) > ; > > racefunc(); > } > } > > int main() > { > printf("movb:\n"); > printf(" starting, please wait.\n"); > start_thread(cpu2); > cpu1(0); > } > > -- > "Love the dolphins," she advised him. "Write by W.A.S.T.E.." > >

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sun Apr 30 2000 - 21:00:08 EST