Re: Fast memcpy patch

From: N. Coesel
Date: Wed Nov 23 2011 - 07:51:56 EST


Sasha,

At 13:10 23-11-2011, Sasha Levin wrote:
On Wed, 2011-11-23 at 12:25 +0100, N. Coesel wrote:
> Dear readers,
> I noticed the Linux kernel still uses a byte-by-byte copy method for
> memcpy. Since most memory allocations are aligned to the integer size
> of a cpu it is often faster to copy by using the CPU's native word
> size. The patch below does that. The code is already at work in many
> 16 and 32 bit embedded products. It should also work for 64 bit
> platforms. So far I only tested 16 and 32 bit platforms.

[snip]

memcpy (along with other mem* functions) are arch specific - for
example, look at arch/x86/lib/memcpy_64.S for the implementation(s) for
x86.

The code under lib/string.c is simple and should work on all platforms
(and is probably not being used anywhere anymore).

Thanks for pointing that out. Currently my primary target is ARM. It seems the memcpy for that arch uses byte-by-byte copying as well with some loop unrolling. I modified the code so it tries to use word-by-word copy if the pointers are aligned on word boundaries, if not it reverts to the old method. For clarity: by word I mean the CPU's native bus width. In case of ARM that's (still) 32 bit.

Nico Coesel


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/