RE: [BUG mm] "fixed" i386 memcpy inlining buggy

From: Richard B. Johnson
Date: Wed Apr 06 2005 - 08:24:07 EST



Attached is inline ix86 memcpy() plus test code that tests its
corner-cases. The in-line code makes no jumps, but uses longword
copies, word copies and any spare byte copy. It works at all
offsets, doesn't require alignment but would work fastest if
both source and destination were longword aligned.

On Wed, 6 Apr 2005, Dave Korn wrote:

----Original Message----
From: Dave Korn
Sent: 06 April 2005 12:13

----Original Message----
From: Dave Korn
Sent: 06 April 2005 12:06


Me and my big mouth.

OK, that one does work.

Sorry for the outburst.



.... well, actually, maybe it doesn't after all.


What's that uninitialised variable ecx doing there eh?


cheers,
DaveK
--
Can't think of a witty .sigline today....

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


Cheers,
Dick Johnson
Penguin : Linux version 2.6.11 on an i686 machine (5537.79 BogoMips).
Notice : All mail here is now cached for review by Dictator Bush.
98.36% of all statistics are fiction.
#include <stdio.h>
#include <string.h>


//
// Inline ix86 memcpy() that contains no jumps. Not copied from
// anybody. Contributed by rjohnson@xxxxxxxxxxxx
//

static __inline__ void *memcpy(void *dst, void *src, size_t len) {
void *ret = dst;
__asm__ __volatile__( \
"cld\n" \
"shr $1, %%ecx\n" \
"pushf\n" \
"shr $1, %%ecx\n" \
"pushf\n" \
"rep\n" \
"movsl\n" \
"popf\n" \
"adcl %%ecx, %%ecx\n" \
"rep\n" \
"movsw\n" \
"popf\n" \
"adcl %%ecx, %%ecx\n" \
"rep\n" \
"movsb\n" \
: "=D" (dst), "=S" (src), "=c"(len)
: "0" (dst), "1" (src), "2" (len)
: "memory" );
return ret;
}


const char tester[]= "0123456789"
"0123456789"
"0123456789"
"0123456789"
"0123456789"
"0123456789"
"0123456789"
"0123456789";
char allocated[0x1000];

int main()
{
size_t i;
char buf[0x1000];

memset(buf, 0x00, sizeof(buf));
for(i=0; i< sizeof(buf); i++)
puts(memcpy(buf, (char *)tester, i));
memset(buf, 0x00, sizeof(buf));
for(i=0; i< sizeof(buf)-1; i++)
puts(memcpy(&buf[1], (char *)tester, i));
memset(buf, 0x00, sizeof(buf));
for(i=0; i< sizeof(buf)-2; i++)
puts(memcpy(&buf[2], (char *)tester, i));
memset(buf, 0x00, sizeof(buf));
for(i=0; i< sizeof(buf)-3; i++)
puts(memcpy(&buf[3], (char *)tester, i));
return 0;
}