Re: spin_unlock optimization(i386)

Oliver Xymoron (oxymoron@waste.org)
Thu, 25 Nov 1999 12:30:43 -0600 (CST)


On Thu, 25 Nov 1999, Erich Boleyn wrote:

> > What you can't be sure of is that things after the write haven't already
> > occurred - speculative execution of reads past the end of a critical
> > section or into the beginning of another:
> >
> > CPU 1 CPU2
> > write data,shared
> > write 0,lock
> > bts lock
> > jc 2
> > read shared
> >
> > Here, shared is guaranteed to be written in program order, that is before
> > lock. But we have no guarantee that CPU2's read doesn't speculatively
> > happen before the lock grab, which means causality here is not preserved.
>
> All reads/writes happen in program order with respect to the observed
> memory image.
>
> If anything changes to the inputs of a speculatively executed instruction
> prior to becoming committed state, then that work is thrown out. There is
> no causality violation.
>
> In specific, since "lock" and "shared" observed in order, and since
> IA32 executes in program order with respect to the observed memory image,
> then when the change to "lock" is observed for the "bts lock", the
> change to "shared" MUST be.

Are you sure? I don't see why the shared read can't be speculatively done
before the lock read+write. The memory ordering rules _do_ say that reads
can pass writes and correctness is only guaranteed when the write is to
the same location _on the same processor_. I don't see anything that
suggests a write on CPU 1 throws out a speculative read result on CPU 2.
Unless there's new snooping magic on PIIs and above.

So, theoretically (though not likely in practice!), shared can be
speculatively read by CPU 2, written by CPU 1, and then the lock can be
freed by CPU 1 and grabbed by CPU 2. All writes are still observed in
order, but there's still a potential race.

> > Which is ok, of course, since we need to force atomicity of the lock grab
> > anyway and the way to do that is with the LOCK prefix, which ends up
> > serializing memory access.
>
> I believe LOCK# doesn't serialize memory accesses, it just prevents other
> bus readers/writers from getting access to that location for the duration
> of the operation in question.

Here's what my PPro manual has to say in the Multiple Processor
Management section:

Locked operations are serializing. They wait for all previous
instructions to complete. Locked operations are atomic with respect
to all other memory operations and all externally visible
events. Only instruction fetch and page table accesses can pass
locked instructions.

Combined with my interpretation of the read/write ordering rules,
serializing for LOCK is the right thing to do - LOCK is primarily useful
for synchronizing data between processors.

At any rate, the LOCK is required for atomicity. And if either of us is
right about ordering and serialization, the current spin_lock does the
right thing.

--
 "Love the dolphins," she advised him. "Write by W.A.S.T.E.." 

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/