Re: [PATCH RFC] [X86] performance improvement for memcpy_64.S byfast string.

From: H. Peter Anvin
Date: Fri Nov 13 2009 - 01:04:36 EST


On 11/12/2009 09:33 PM, Ma, Ling wrote:
>> Well, so you are running cache hot and it is only a win on huge
>> copies... how common are those?
>>
> Hi Pavel Machek
> Yes, we intend to introduce movsq for huge hot size(over 1024bytes)
> and avoid regression for less 1024bytes. I guess you suggest using
> prefetch instruction for cold data (if I was wrong please correct me).
> memcpy don't know whether data has been in cache or not,
> so only when copy size is over (first level 1 cache)/2 and lower
> (last level cache)/2 , prefetch will get benefit. Currently first
> level cache size of most cpus is around 32KB, so it is useful for prefetch
> when copy size is over 16KB, but as H. Peter Anvin mentioned in last email,
> over 16KB copy in kernel is rare.
>

What it sounds to me is that for Nehalem, we want to use memcpy_c for >=
1024 bytes and the old code for < 1024 bytes; for Core2 it might be the
exact opposite.

Either way, whatever we do should use the appropriate static replacement
mechanism.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/