IA32 SMP is Processor Ordered for WB memory (why would you use WT for
normal RAM anyway?), which essentially means the same thing as strong
ordering with the following caveat:
Stores from one processor may not all be seen by other processors
yet, though if any one store is observed by processor A on
processor B, any earlier stores (in program order) from processor
A are guaranteed to be observed by processor B.
In loose terms, each processor has a queue of committed stores that hasn't
made it to the outside world yet. Once stores are observed by a processor,
the normal program ordering constraints apply.
IA32 is strongly ordered for UC (uncached) memory.
You could have memory-ordering #defines... I.e. "#ifdef PROCESSOR_ORDERING",
"#ifdef WEAK_ORDERING", etc... in general as you go to the weaker memory
orderings you have to add more code, anyway.
> Assume this piece of code that is basically the wait_event interface:
>
> unsigned long event_happened = 0;
>
> wait_for_event()
> {
> current_task_state = WANT_TO_GO_TO_SLEEP;
> if (!event_happened)
> schedule(); /* schedule will put the task to sleep if
> current_task_state is set to
> WANT_TO_GO_TO_SLEEP or it will
> return immediatly if current_task_state
> is set to RUNNING */
> }
>
> event_callback() /* this is called when the event happens
> and it may run on a CPU different than
> the one where wait_for_event() runs */
> {
> event_happened = 1;
> current_task_state = RUNNING;
> }
>
> The above interface rely on the ordering as written above.
>
> Even the compiler should be allowed to reorder the above instructions as
> the compiler always assumes that the program is single threaded. So for
> example the event_callback() could be implemented in ASM this way without
> a _compiler_ barrier:
...
> I hope I am been clear enough, if not please ask more details.
>
> So at least a barrier() for the compiler between the read/writes is
> necessary also for IA32.
Make the variables "volatile" and the compiler is required by the ANSI
standard to not reorder them around sequence points (which the end-of-
statement delimiter is).
ALL SMP-visible or memory-mapped hardware variables should in general be
"volatile" to make sure you get exactly what you're asking for from the
compiler.
> What is interesting to know is if the IA32 enforce strong ordering in
> software in SMP across multiple CPUs like if both current_task_state and
> even_happened would been allocated on _not_ cachable memory. If that is
> true then it seems to me we can drop some _more_ lock on the bus from the
> IA32 port... (as first the xchg in set_mb() can be removed and replaced
> with a normal write in IA32 for example). NOTE: set_mb() and in turn
> set_current_state() (that uses set_mb()) is still necessary (for other
> archs like Alpha) in order to write correct (SMP safe) common code of
> course.
In IA32, (UC) Uncacheable memory is Strongly Ordered, compared to the
weaker case of Processor Ordering for Writeback (WB) memory (where stores
can be arbitrarily delayed from the other processors). This means you
generally get the appropriately intuitive result when surrounding, for
example, driver reads/writes (reads/writes to I/O space are also strongly
ordered) with normal loads and stores.
Erich Boleyn
PMD IA32 Architecture
Intel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/