Re: x86 memcpy performance

From: Valdis . Kletnieks
Date: Mon Aug 15 2011 - 22:36:51 EST


On Sun, 14 Aug 2011 11:59:10 +0200, Borislav Petkov said:

> Benchmarking with 10000 iterations, average results:
> size XM MM speedup
> 119 540.58 449.491 0.8314969419

> 12273 2307.86 4042.88 1.751787902
> 13924 2431.8 4224.48 1.737184756
> 14335 2469.4 4218.82 1.708440514
> 15018 2675.67 1904.07 0.711622886
> 16374 2989.75 5296.26 1.771470902
> 24564 4262.15 7696.86 1.805863077
> 27852 4362.53 3347.72 0.7673805572
> 28672 5122.8 7113.14 1.388524413
> 30033 4874.62 8740.04 1.792967931

The numbers for 15018 and 27852 are *way* odd for the MM case. I don't feel
really good about this till we understand what happened for those two cases.

Also, anytime I see "10000 iterations", I ask myself if the benchmark rigging
took proper note of hot/cold cache issues. That *may* explain the two oddball
results we see above - but not knowing more about how it was benched, it's hard
to say.

Attachment: pgp00000.pgp
Description: PGP signature