Re: [CFT] faster athlon/duron memory copy implementation

From: Ed Sweetman (ed.sweetman@wmich.edu)
Date: Thu Oct 24 2002 - 15:09:41 EST


I seem to be seeing compiler optimizations come into play with the
numbers and not any mention of them that i've seen has been talked
about. That could be causing any discrepencies with predicted values. So
not only would we have to look at algorithms, but also the compilers and
what optimizations we plan on using them with. Some do better on
certain compilers+flags than others. It's a mixmatch that seems to only
get complicated the more realistic you make it.

athlon tbird 1133.402Mhz, 133Mhz fsb, 512MB + 128MB pc133sdram No hd

Here are 3 averages of 3 runs each

gcc (GCC) 3.2.1 20021020 (Debian prerelease)

flags : -O3 -march=athlon-tbird -mcpu=athlon-tbird -falign-loops=4

copy_page function 'warm up run' took 17577 cycles per page
copy_page function '2.4 non MMX' took 23659 cycles per page
copy_page function '2.4 MMX fallback' took 23894 cycles per page
copy_page function '2.4 MMX version' took 17549 cycles per page
copy_page function 'faster_copy' took 10452 cycles per page
copy_page function 'even_faster' took 10159 cycles per page
copy_page function 'no_prefetch' took 9508 cycles per page

flags : -O0 -march=i686 -mcpu=i686

copy_page function 'warm up run' took 18377 cycles per page
copy_page function '2.4 non MMX' took 23688 cycles per page
copy_page function '2.4 MMX fallback' took 23671 cycles per page
copy_page function '2.4 MMX version' took 18407 cycles per page
copy_page function 'faster_copy' took 10091 cycles per page
copy_page function 'even_faster' took 10283 cycles per page
copy_page function 'no_prefetch' took 9907 cycles per page

gcc 2.95.4

flags : -O0 -march=i686 -mcpu=i686

copy_page function 'warm up run' took 18343 cycles per page
copy_page function '2.4 non MMX' took 23655 cycles per page
copy_page function '2.4 MMX fallback' took 23646 cycles per page
copy_page function '2.4 MMX version' took 18324 cycles per page
copy_page function 'faster_copy' took 10146 cycles per page
copy_page function 'even_faster' took 10438 cycles per page
copy_page function 'no_prefetch' took 9913 cycles per page

----------------------------------------------------------------------

avg difference due to compiler
copy_page function 'warm up run' took +-533 cycles per page
copy_page function '2.4 non MMX' took +-22 cycles per page
copy_page function '2.4 MMX fallback' took +-496 cycles per page
copy_page function '2.4 MMX version' took +-572 cycles per page
copy_page function 'faster_copy' took +-241 cycles per page
copy_page function 'even_faster' took +-186 cycles per page

Other options may give even greater differences but it's difficult to
try them all so I thought this should give an example of proof. The
options did not do better than all of the runs of another option but
instead did better and worse depending on the test.

copy_page function 'warm up run' test1
copy_page function '2.4 non MMX' test3
copy_page function '2.4 MMX fallback' test3
copy_page function '2.4 MMX version' test1
copy_page function 'faster_copy' test2
copy_page function 'even_faster' test1
copy_page function 'no_prefetch' test1

bandwidth test1 = 488.3MB/sec (not ddr like other setups)
bandwidth test2 = 468.6MB/sec
bandwidth test3 = 468.3MB/sec

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Oct 31 2002 - 22:00:24 EST