Re: [PATCH 1/2] perf bench: port memcpy_64.S to perf bench

From: Miao Xie
Date: Mon Dec 20 2010 - 01:29:23 EST


On Sun, 19 Dec 2010 01:25:00 +0900, Hitoshi Mitake wrote:
On 2010å10æ31æ 04:21, Ingo Molnar wrote:

* Peter Zijlstra<a.p.zijlstra@xxxxxxxxx> wrote:

On Sat, 2010-10-30 at 01:01 +0900, Hitoshi Mitake wrote:
This patch ports arch/x86/lib/memcpy_64.S to "perf bench mem".
When PERF_BENCH is defined at preprocessor level,
memcpy_64.S is preprocessed to includable form from the sources
under tools/perf for benchmarking programs.

Signed-off-by: Hitoshi Mitake<mitake@xxxxxxxxxxxxxxxxxxxxx>
Cc: Ma Ling:<ling.ma@xxxxxxxxx>
Cc: Zhao Yakui<yakui.zhao@xxxxxxxxx>
Cc: Peter Zijlstra<a.p.zijlstra@xxxxxxxxx>
Cc: Arnaldo Carvalho de Melo<acme@xxxxxxxxxx>
Cc: Paul Mackerras<paulus@xxxxxxxxx>
Cc: Frederic Weisbecker<fweisbec@xxxxxxxxx>
Cc: Steven Rostedt<rostedt@xxxxxxxxxxx>
Cc: Thomas Gleixner<tglx@xxxxxxxxxxxxx>
Cc: H. Peter Anvin<hpa@xxxxxxxxx>
---
arch/x86/lib/memcpy_64.S | 30 ++++++++++++++++++++++++++++++
1 files changed, 30 insertions(+), 0 deletions(-)

diff --git a/arch/x86/lib/memcpy_64.S b/arch/x86/lib/memcpy_64.S
index 75ef61e..72c6dfe 100644
--- a/arch/x86/lib/memcpy_64.S
+++ b/arch/x86/lib/memcpy_64.S
@@ -1,10 +1,23 @@
/* Copyright 2002 Andi Kleen */

+/*
+ * perf bench adoption by Hitoshi Mitake
+ * PERF_BENCH means that this file is included from
+ * the source files under tools/perf/ for benchmark programs.
+ *
+ * You don't have to care about PERF_BENCH when
+ * you are working on the kernel.
+ */
+
+#ifndef PERF_BENCH

I don't like littering the actual kernel code with tools/perf/
ifdeffery..


Yeah - could we somehow accept that file into a perf build as-is?

Thanks,

Ingo


Really sorry for my slow work...

BTW, I have a question for Miao and Ingo.
We are planning to implement new memcpy() of Miao,
and the important point is not removing previous memcpy()
for future architectures and benchmarkings.

I feel that adding new CPU feature flag (like X86_FEATURE_REP_GOOD)
and switching memcpy() with alternative mechanism is good way.
(So we will have three memcpy()s: rep based, unrolled, and new
unaligned oriented one)
But there is another way: #ifdef. Which do you prefer?

I agree with your idea, but Ma Ling said this way may cause the i-cache
miss problem.
http://marc.info/?l=linux-kernel&m=128746120107953&w=2
(The size of the i-cache is 32K, the size of memcpy() in my patch is 560Byte,
and the size of the last version in tip tree is 400Byte).

But I have not tested it, so I don't know the real result. Maybe we should
try to implement the new memcpy() first.

And could you tell me the detail of CPU family information
you are targeting, Miao?

They are Core2 Duo E7300(Core name: Wolfdale) and Xeon X5260(Core name: Wolfdale-DP).

The following is the detailed information of these two CPU:
Core2 Duo E7300:
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Core(TM)2 Duo CPU E7300 @ 2.66GHz
stepping : 6
cpu MHz : 1603.000
cache size : 3072 KB
physical id : 0
siblings : 2
core id : 1
cpu cores : 2
apicid : 1
initial apicid : 1
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good aperfmperf pni dtes64 monitor ds_cpl est tm2 ssse3 cx16 xtpr pdcm sse4_1 lahf_lm dts
bogomips : 5319.70
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:

Xeon X5260:
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Xeon(R) CPU X5260 @ 3.33GHz
stepping : 6
cpu MHz : 1999.000
cache size : 6144 KB
physical id : 3
siblings : 2
core id : 1
cpu cores : 2
apicid : 7
initial apicid : 7
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall lm constant_tsc arch_perfmon pebs bts rep_good aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 lahf_lm dts tpr_shadow vnmi flexpriority
bogomips : 6649.07
clflush size : 64
cache_alignment : 64
address sizes : 38 bits physical, 48 bits virtual
power management:

Thanks
Miao
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/