I noticed your binary ran with N=2000000 which is only sufficient for a
2 proc 1 MB cache opteron box according to the documentation on the
It does not seem to make any difference.
stream faq. I also noticed wide variation in results (25% or so) when
running with 4 threads on a 4 proc opteron on linux-2.6.5-mm5. Can you
provide me with the specs of the system you ran your tests on?
Yes, mm5 is still broken because it has the "tuned to numasaurus" numa
scheduler. Run it on a standard (non mm*) kernel or with Ingo's early load balance patch.