Re: spin_unlock optimization(i386)

Erich Boleyn (esboleyn@ichips.intel.com)
Fri, 26 Nov 1999 13:10:43 -0800


Erich Boleyn <esboleyn@ichips.intel.com> wrote:

> The answer isn't that reads are speculating earlier in time willy-nilly.
>
> The answer is the somewhat more subtle one that all processors MUST agree
> on the resulting value for the location in question (since we're doing
> coherent memory). The stores are delayed, so when an "external observation"
> from processor B appears on processor A, and processor A has a conflicting
> store in it's queue of committed stores, who wins?
>
> Answer: The last processor to "externally observe" it's store.
> (actually, this isn't quite true, you just need to assure that one
> of them will win). I.e. processor A is insisting on the value being
> what it has in it's queue of committed but not yet "externally
> observed" stores.
>
> So, I guess we do get a kind of causality violation here.
>
> In the example program, the problem was when the first processor's store
> to "current_state" stomps the one done by the second processor, even
> though that store is LATER in observed order on the first processor.
>
> The LOCKed case working was because it drained the first processor's
> store queue, therefore not leaving anything around to conflict with the
> observed stores from other processors after the critical read.

I double-checked on this for sanity's sake, and this is definitely the
case.

There is no read-around-write reordering going on here, and the stores
are in order.

In the example:

cpu1: Store 1 => A
read B

cpu2: Store 0 => B
Store 0 => A

...and assuming that if you hadn't seen the store to B from cpu2 on
cpu1 yet (i.e. didn't get a 0 for read B), then you'd definitely get
a 0 for A...

that's the problem. cpu1's store into A may still be in it's queue of
committed but not-yet-externally-observed stores, so when cpu2's
stores come along, still nicely in order, mind you, then

That's also why the serializing store into A for cpu1 fixes it, you're
simply draining the store queue, and so you're sure that if you observe
the new value of B *later* than the "read B", then the new value of
A can't be messed up by anything in the store queue.

So, as I said before, IA32 isn't reordering anything in a visible way,
it's effectively "shielding" the address on cpu1 because it hasn't
gotten out of it's store queue yet.

--
Erich Boleyn
<esboleyn@ichips.intel.com>
PMD IA32 Architecture
Intel

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/