I believe the P5 MMX unit is also somewhat pipelined.
8 byte alignment is very important, but perhaps equally important
(according to an MMX conference I attended a couple of years ago)
is the order in which you read and write within cache lines.
Because the P5 does write-through caching while PII does write-back, I'd
expect the read and write ordering to be a lot more critical for the P5.
This means you have to be (1) cache aligned, not just 8 bytes; (2) have
to get the access order right, don't ask me what it is though :-)
Really, the right MMX checksum on the P5 should max out the memory
busses. The overhead would be in the FPU context save/restore and
switching to MMX mode.
-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/faq.html