Re: [PATCH] optimize ia32 memmove

From: Jeff Garzik
Date: Tue Dec 30 2003 - 02:58:16 EST


Andrew Morton wrote:
Jeff Garzik <jgarzik@xxxxxxxxx> wrote:

Linux Kernel Mailing List wrote:

ChangeSet 1.1496.22.32, 2003/12/29 21:45:30-08:00, akpm@xxxxxxxx

[PATCH] optimize ia32 memmove

From: Manfred Spraul <manfred@xxxxxxxxxxxxxxxx>

The memmove implementation of i386 is not optimized: it uses movsb, which is
far slower than movsd. The optimization is trivial: if dest is less than
source, then call memcpy(). markw tried it on a 4xXeon with dbt2, it saved
around 300 million cpu ticks in cache_flusharray():

[...]

diff -Nru a/include/asm-i386/string.h b/include/asm-i386/string.h
--- a/include/asm-i386/string.h Mon Dec 29 23:13:20 2003
+++ b/include/asm-i386/string.h Mon Dec 29 23:13:20 2003
@@ -299,14 +299,9 @@
static inline void * memmove(void * dest,const void * src, size_t n)
{
int d0, d1, d2;
-if (dest<src)
-__asm__ __volatile__(
- "rep\n\t"
- "movsb"
- : "=&c" (d0), "=&S" (d1), "=&D" (d2)
- :"0" (n),"1" (src),"2" (dest)
- : "memory");
-else
+if (dest<src) {
+ memcpy(dest,src,n);
+} else
__asm__ __volatile__(
"std\n\t"
"rep\n\t"

Dumb question, though... what about the overlap case, when dest<src ? It seems to me this change is ignoring that.



"if dest is less that source, then call memcpy". If the move is to a
higher address we do it the old way.


I'm confused... that doesn't say anything to me about overlap.

They can still overlap: Consider if dest is 1 byte less than src, and n==128...

Jeff



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/