Re: [PATCH RFC] [X86] performance improvement for memcpy_64.S byfast string.

From: H. Peter Anvin
Date: Fri Nov 13 2009 - 03:05:17 EST


On 11/12/2009 11:33 PM, Ingo Molnar wrote:
>
> * Pavel Machek <pavel@xxxxxx> wrote:
>
>>> Ling, if you are interested, could you send a user-space test-app to
>>> this thread that everyone could just compile and run on various older
>>> boxes, to gather a performance profile of hand-coded versus string ops
>>> performance?
>>>
>>> ( And i think we can make a judgement based on cache-hot performance
>>> alone - if then the strings ops will perform comparatively better in
>>> cache-cold scenarios, so the cache-hot numbers would be a conservative
>>> estimate. )
>>
>> Ugh, really? I'd expect cache-cold performance to be not helped at all
>> (memory bandwidth limit) and you'll get slow down from additional
>> i-cache misses...
>
> That's my point - the new code is shorter, which will run comparatively
> faster in a cache-cold environment.
>

memcpy_c by itself is by far the shortest variant, of course.

The question is if it makes sense to use the long variants for short (<
1024 bytes) copies.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/