Jeff Garzik <jgarzik@xxxxxxxxx> wrote:
Linux Kernel Mailing List wrote:
ChangeSet 1.1496.22.32, 2003/12/29 21:45:30-08:00, akpm@xxxxxxxx
[PATCH] optimize ia32 memmove
From: Manfred Spraul <manfred@xxxxxxxxxxxxxxxx>
The memmove implementation of i386 is not optimized: it uses movsb, which is
far slower than movsd. The optimization is trivial: if dest is less than
source, then call memcpy(). markw tried it on a 4xXeon with dbt2, it saved
around 300 million cpu ticks in cache_flusharray():
[...]
diff -Nru a/include/asm-i386/string.h b/include/asm-i386/string.h
--- a/include/asm-i386/string.h Mon Dec 29 23:13:20 2003
+++ b/include/asm-i386/string.h Mon Dec 29 23:13:20 2003
@@ -299,14 +299,9 @@
static inline void * memmove(void * dest,const void * src, size_t n)
{
int d0, d1, d2;
-if (dest<src)
-__asm__ __volatile__(
- "rep\n\t"
- "movsb"
- : "=&c" (d0), "=&S" (d1), "=&D" (d2)
- :"0" (n),"1" (src),"2" (dest)
- : "memory");
-else
+if (dest<src) {
+ memcpy(dest,src,n);
+} else
__asm__ __volatile__(
"std\n\t"
"rep\n\t"
Dumb question, though... what about the overlap case, when dest<src ? It seems to me this change is ignoring that.
"if dest is less that source, then call memcpy". If the move is to a
higher address we do it the old way.