> Make sure you benchmark both the cached and uncached cases.
I did a test copying large amounts of memory around in 64kb chunks..
Cached:
3dnow : 25 secs
integer : 3 minutes
Non-Cached:
3dnow : 75 secs
integer : (Still running on another box after ~15 minutes)
> If it is definitely a win to use the 3dnow code then use it.
I think the above figures speak for themselves.
Remember, I'm not running this on the correct bus speed, so the
proper benchmark will probably be much higher.
> Note that there is some stuff pending (hopefully for 2.4.0) that
> allows you to plug in multiple memcpy routines and handle the choice
> per cpu.
That sounds great.
How will these be selected, by an extra menu in the config ?
> That will also allow you to do finer tuning for the winchip.
Since my initial posting, I read that on the the Winchip 2A the prefetch
instruction is treated as a nop. Eliminating these from the routine gains
a few more benchmarks. Would that then make it the same as the MMX-copy ?
> Right now with the current draft of that code it has support for
Is this already in 2.3.x ?
regards,
Dave.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/