Re: [PATCH] optimize ia32 memmove

From: Manfred Spraul
Date: Tue Dec 30 2003 - 06:14:05 EST


Jeremy Fitzhardinge wrote:

>On Tue, 2003-12-30 at 01:58, Linus Torvalds wrote:
>
>>But then anything that does the loads in ascending order is still ok, so
>>it shouldn't matter - by the time "dest" has been overwritten, the source
>>data has already been read. And all the "memcpy()" implementations had
>>better do that anyway, in order to get nice memory access patterns. "rep
>>movsl" certainly does.

AMD recommends to perform bulk copies backwards: That defeats the hw prefecher, and results in even better access patterns. Doesn't matter in this case, memmove is never used for bulk copies.

>A PPC memcpy may end up clearing the destination before reading the
>source (using the cache-line zeroing instruction, to prevent the
>destination from being spuriously read to populate the cache line).
>
The change is i386 only, no effect on other archs.
I found the unoptimized memmove in oprofiles of dbt2 testruns: slab contains a few memmoves to keep it's recently used arrays in strict LIFO order. Typically perhaps 100 bytes.

--
Manfred


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/