Re: [PATCH V2 -tip] lib,x86_64: improve the performance of memcpy()for unaligned copy

From: Miao Xie
Date: Mon Oct 18 2010 - 02:34:39 EST


On Mon, 18 Oct 2010 14:27:40 +0800, Ma, Ling wrote:
Could please send out cpu info for this cpu model.

processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Core(TM)2 Duo CPU E7300 @ 2.66GHz
stepping : 6
cpu MHz : 1603.000
cache size : 3072 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good aperfmperf pni dtes64 monitor ds_cpl est tm2 ssse3 cx16 xtpr pdcm sse4_1 lahf_lm
bogomips : 5319.99
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:

Thanks
Miao


Thanks
Ling

-----Original Message-----
From: Miao Xie [mailto:miaox@xxxxxxxxxxxxxx]
Sent: Monday, October 18, 2010 2:24 PM
To: Ma, Ling
Cc: H. Peter Anvin; Ingo Molnar; Andi Kleen; Thomas Gleixner; Zhao, Yakui; Linux Kernel
Subject: Re: [PATCH V2 -tip] lib,x86_64: improve the performance of memcpy() for unaligned copy

On Fri, 15 Oct 2010 03:43:53 +0800, Ma, Ling wrote:
Attachment includes memcpy-kernel.c(cc -O2 memcpy-kernel.c -o
memcpy-kernel), and unaligned test cases on Atom.

I have tested on my Core2 Duo machine with your benchmark tool. Attachment is the test result. But the result is different with yours on Atom, It seems the performance is better with this patch.

According to these two different result, maybe we need optimize memcpy() by CPU model.

Thanks
Miao


Thanks
Ling

-----Original Message-----
From: Ma, Ling
Sent: Thursday, October 14, 2010 9:14 AM
To: 'H. Peter Anvin'; miaox@xxxxxxxxxxxxxx
Cc: Ingo Molnar; Andi Kleen; Thomas Gleixner; Zhao, Yakui; Linux
Kernel
Subject: RE: [PATCH V2 -tip] lib,x86_64: improve the performance of
memcpy() for unaligned copy

Sure, I will post benchmark tool and benchmark on Atom 64bit soon.

Thanks
Ling

-----Original Message-----
From: H. Peter Anvin [mailto:hpa@xxxxxxxxx]
Sent: Thursday, October 14, 2010 5:32 AM
To: miaox@xxxxxxxxxxxxxx
Cc: Ma, Ling; Ingo Molnar; Andi Kleen; Thomas Gleixner; Zhao, Yakui;
Linux Kernel
Subject: Re: [PATCH V2 -tip] lib,x86_64: improve the performance of
memcpy() for unaligned copy

On 10/08/2010 02:02 AM, Miao Xie wrote:
On Fri, 8 Oct 2010 15:42:45 +0800, Ma, Ling wrote:
Could you please give us full address for each comparison result,we will do some tests on my machine.
For unaligned cases older cpus will crossing cache line and slow down caused by load and store, but for nhm, no necessary to care about it.
By the way in kernel 64bit mode, our access mode should be around 8byte aligned.

Would you need my benchmark tool? I think it is helpful for your test.


If you could post the benchmark tool that would be great.

-hpa




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/