Re: spin_unlock optimization(i386)

Gerard Roudier (groudier@club-internet.fr)
Wed, 24 Nov 1999 21:51:54 +0100 (MET)


Thanks for the explanation.

But, BTW, this seemed not to have been clear for the authors of your
system programming guide. May-be you should make sure that the below is
also well known at Intel. But my english is not so good and it may have
happened that I missed something important from the guide. If it is the
case, you may want to let us know what part of the doc we should not have
missed.

Thanks in advance.

Gérard.

PS: I didn't see the correct explanation from the guys that claimed that
the single store was just fine for the spin_unlock(). This seems
strange to me given that I cannot beleive such a knowledge to be
required to be innate.

On Tue, 23 Nov 1999, Erich Boleyn wrote:

> Greetings, everyone. I am an Architect in an IA32 development group
> at Intel.
>
> I haven't seen all of the messages in this thread, but am responding
> to this one because there appears to be a misconception here.
>
> Linus Torvalds wrote:
>
> > As a completely made-up example (which will probably never show the
> > problem in real life, but is instructive as an example), imaging running
> > the following test in a loop on multiple CPU's:
>
> > int test_locking(void)
> > {
> > static int a; /* protected by spinlock */
> > int b;
> >
> > spin_lock()
> > a = 1;
> > mb();
> > a = 0;
> > mb();
> > b = a;
> > spin_unlock();
> > return b;
> > }
>
> > Now, OBVIOUSLY the above always has to return 0, right? All accesses to
> > "a" are inside the spinlock, and we always set it to zero before we read
> > it into "b" and return it. So if we EVER returned anything else, the
> > spinlock would obviously be completely broken, wouldn't you say?
> >
> > And yes, the above CAN return 1 with the proposed optimization. I doubt
> > you can make it do so in real life, but hey, add another access to another
> > variable in the same cache line that is accessed through another spinlock
> > (to get cache-line ping-pong and timing effects), and I suspect you can
> > make it happen even with a simple example like the above.
> >
> > The reason it can return 1 quite legally is that your new "spin_unlock()"
> > isnot serializing any more, so there is very little effective ordering
> > between the two actions
>
> It will always return 0. You don't need "spin_unlock()" to be serializing.
>
> The only thing you need is to make sure there is a store in "spin_unlock()",
> and that is kind of true by the fact that you're changing something to be
> observable on other processors.
>
> The reason for this is that stores can only possibly be observed when all
> prior instructions have retired (i.e. the store is not sent outside of the
> processor until it is committed state, and the earlier instructions are
> already committed by that time), so the any loads, stores, etc absolutely
> have to have completed first, cache-miss or not.
>
>
> > b = a;
> > spin_unlock();
> >
> > as they access completely different data (ie no data dependencies in
> > sight). So what you could end up doing is equivalent to
> >
> > CPU#1 CPU#2
> >
> > b = a; /* cache miss, we'll delay this.. */
> > spin_unlock();
> >
> > spin_lock();
> > a = 1;
>
> Since the instructions for the store in the spin_unlock have to have been
> externally observed for spin_lock to be aquired (presuming a correctly
> functioning spinlock, of course), then the earlier instructions to set "b"
> to the value of "a" have to have completed first.
>
> In general, IA32 is Processor Ordered for cacheable accesses. Speculation
> doesn't affect this. Also, stores are not observed speculatively on other
> processors.
>
>
> Erich Boleyn
> PMD IA32 Architecture
> Intel
>
>
>
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.rutgers.edu
> Please read the FAQ at http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/