Re: faster strcpy()

Michael O'Reilly (michael@metal.iinet.net.au)
26 Apr 1998 14:10:03 +0800


"Richard B. Johnson" <root@chaos.analogic.com> writes:
> As previously shown in assembly code. To obtain the length, requires
> that the string be read.
>
> Then to copy the string, requires that the string be read again.

Yes, but now it's a word at a time, not a byte at a time.

> While copying the string using memcpy(), a loop count must be
> tested. While copying directly, a byte must be tested. This is
> essentially a wash.
>
> A simple test program, previously posted, that uses both methods,
> verifies my claims.
>
> As previously posted, the simplist string copy is not the most efficient,
> however it will serve to show the point.

Indeed it does exactly that:

> Simple string copy guaranteed to work (not very efficient).
>
> mov esi,offset source ; 4 clocks
> mov edi,offset destination ; 4 clocks
> cpy: lodsb ; 6 clocks
> stosb ; 6 clocks
> or al,al ; 2 clocks
> jnz cpy ; 2 to many clocks, depends upon
> ; the cache.

So this one is ~ 16 * num_of_bytes + const

> Simple strlen, guaranteed to work (not the most efficient).
>
> mov esi,offset source ; 4 clocks
> mov edx,esi ; 2 clocks
> xor al,al ; 2 clocks
> len: lodsb ; 6 clocks
> or al,al ; 2 clocks
> jnz len ; 2 to many clocks.
> mov eax,esi ; 2 clocks
> sub eax,edx ; 2 clocks
> ; Length in eax

This is 10 * num_of_bytes + const

> Simple memcpy, guaranteed to work (not the most efficient)
>
> mov esi,offset source ; 4 clocks
> mov edi,offset destination ; 4 clocks
> mov ecx,dword ptr [count] ; 6 clocks
> shr ecx,1 ; 2 clocks
> rep movsw ; 6 * number of words
> adc ecx,ecx ; 2 clocks
> rep movsb ; 6 * number of bytes

This is ~ 1.5 * num_of_bytes + const

So this strlen + move is ~11.5 * num_of_bytes whereas the strcopy is
~16 * num_of_bytes

> Now, if you add up the clocks for strlen() and the clocks for
> nemcpy(), you can compare them to the clocks for strcpy().

And indeed, the strlen + memmove is a lot faster, saving nearly 3.5
clocks per byte.

Total overhead is around 22 clocks, so for anything longer than ~8
bytes, doing a strlen() + memmove() is a win.

> I do this exact kind of analysis and work for a living and I am
> very good at it.

Hmmm. I can certainly say I wouldn't hire you on this showing. Making
such elementry errors is a little odd.

Michael.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu