On Mon, 28 Aug 2000, David S. Miller wrote:
>load_locked+store_conditional on Alpha and MIPS. Did you check what
>semantics Alpha guarentees in the reference manual (or more
>importantly what the Alpha cpus really do)? I know you have a copy of
>it and do read it Andrea ;-)))
Alpha definitely needs mb(), ld_l/st_c doesn't imply any memory barrier
(yes, alpha is very aggressive in SMP 8).
Just to make a fun example if the virtual address of ld_l/st_c are
different but within the same 16bytes natuarlly aligned block you have to
put an mb() in __between__ ld_l/st_c to make sure that they are not
reordered and that none write happened in between from another CPU.
Here the docs about what I said:
[..]
A processor executes a LDx_L/STx_C sequence and includes an MB
between the LDx_L to a particular address and the successful STx_C
to a different address (one that meets the constraints required
for predictable behavior). That instruction sequence establishes
an access order under which a store operation by another
pro-cessor to that lock range occurs before the LDx_L or after the
STx_C.
[..]
If the address specified by a STx_C instruction does not match the one
given in the preceding LDx_L instruction, an MB is required to guarantee
ordering between the two instructions.
[..]
Then there's basically the asm spinlock implementation that is just like
our linux spin_lock/spin_unlock that does:
"1: ldl_l %0,%1\n"
" blbs %0,2f\n"
" or %0,1,%0\n"
" stl_c %0,%1\n"
" beq %0,2f\n"
" mb\n"
and it adds the mb at the end as the linux spinlocks does and as the linux
test_and_set/clear_bit does too.
If ll/sc would imply a memory barrier we wouldn't need the mb at the end.
Then some other docs (it doesn't refer explicitly to ll/sc but it's quite
obvious they are included too).
[..]
5.6.3 Implied Barriers
There are no implied barriers in Alpha. If an implied
barrier is needed for functionally correct access to
shared data, it must be written as an explicit
instruction. (Software must explicitly include any needed
MB, WMB, or CALL_PAL IMB instructions.) Alpha transitions
such as the following have no built-in implied memory
barriers: " Entry to PALcode " Sending and receiving
interrupts " Returning from exceptions, interrupts, or
machine checks " Swapping context " Invalidating the
Translation Buffer (TB) Depending on implementation
choices for maintaining cache coherency, some
PALcode/cache implementations may have an implied CALL_PAL
IMB in the I-stream TB fill routine, but this is
transparent to the non-PALcode programmer.
[..]
>I really cannot imagine a chip allowing the waitqueue_empty read to
>occur before the ll/sc sequence, that would make no sense at all and
>be very difficult silicon to even implement.
I think it's the other way around, I think it's simpler to not do any
magic barrier for that instructions from an hardware point of view. In the
alpha there's just a per cpu lock_flag and lock_physical_address
registers. ld_l set the lock_flag to 1 and sets the lock_physical_address
to the address that we want to monitor for changes from under us. Each
write or successful-write_conditional or wh64 (used for dropping not
interesting cachelines) that is run on the range of locked address on any
cpu will clear the lock flag for all the CPUs that wanted to monitor the
16byte block for changes. If the lock_flag is clear in the current cpu the
next st_c will fail. I think it's just an hardware logic (strightforwad
scalable and very efficient) that works indipendently of the speculative
read or write buffer parts of the cpu.
Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
This archive was generated by hypermail 2b29 : Thu Aug 31 2000 - 21:00:23 EST