Re: [PATCH 1/2] perf bench: port memcpy_64.S to perf bench

From: Hitoshi Mitake
Date: Mon Dec 20 2010 - 10:34:47 EST


On Mon, Dec 20, 2010 at 15:30, Miao Xie <miaox@xxxxxxxxxxxxxx> wrote:
> On Sun, 19 Dec 2010 01:25:00 +0900, Hitoshi Mitake wrote:
>>
>> On 2010å10æ31æ 04:21, Ingo Molnar wrote:
>>>
>>> * Peter Zijlstra<a.p.zijlstra@xxxxxxxxx> wrote:
>>>
>>>> On Sat, 2010-10-30 at 01:01 +0900, Hitoshi Mitake wrote:
>>>>>
>>>>> This patch ports arch/x86/lib/memcpy_64.S to "perf bench mem".
>>>>> When PERF_BENCH is defined at preprocessor level,
>>>>> memcpy_64.S is preprocessed to includable form from the sources
>>>>> under tools/perf for benchmarking programs.
>>>>>
>>>>> Signed-off-by: Hitoshi Mitake<mitake@xxxxxxxxxxxxxxxxxxxxx>
>>>>> Cc: Ma Ling:<ling.ma@xxxxxxxxx>
>>>>> Cc: Zhao Yakui<yakui.zhao@xxxxxxxxx>
>>>>> Cc: Peter Zijlstra<a.p.zijlstra@xxxxxxxxx>
>>>>> Cc: Arnaldo Carvalho de Melo<acme@xxxxxxxxxx>
>>>>> Cc: Paul Mackerras<paulus@xxxxxxxxx>
>>>>> Cc: Frederic Weisbecker<fweisbec@xxxxxxxxx>
>>>>> Cc: Steven Rostedt<rostedt@xxxxxxxxxxx>
>>>>> Cc: Thomas Gleixner<tglx@xxxxxxxxxxxxx>
>>>>> Cc: H. Peter Anvin<hpa@xxxxxxxxx>
>>>>> ---
>>>>> arch/x86/lib/memcpy_64.S | 30 ++++++++++++++++++++++++++++++
>>>>> 1 files changed, 30 insertions(+), 0 deletions(-)
>>>>>
>>>>> diff --git a/arch/x86/lib/memcpy_64.S b/arch/x86/lib/memcpy_64.S
>>>>> index 75ef61e..72c6dfe 100644
>>>>> --- a/arch/x86/lib/memcpy_64.S
>>>>> +++ b/arch/x86/lib/memcpy_64.S
>>>>> @@ -1,10 +1,23 @@
>>>>> /* Copyright 2002 Andi Kleen */
>>>>>
>>>>> +/*
>>>>> + * perf bench adoption by Hitoshi Mitake
>>>>> + * PERF_BENCH means that this file is included from
>>>>> + * the source files under tools/perf/ for benchmark programs.
>>>>> + *
>>>>> + * You don't have to care about PERF_BENCH when
>>>>> + * you are working on the kernel.
>>>>> + */
>>>>> +
>>>>> +#ifndef PERF_BENCH
>>>>
>>>> I don't like littering the actual kernel code with tools/perf/
>>>> ifdeffery..
>>>
>>>
>>> Yeah - could we somehow accept that file into a perf build as-is?
>>>
>>> Thanks,
>>>
>>> Ingo
>>>
>>
>> Really sorry for my slow work...
>>
>> BTW, I have a question for Miao and Ingo.
>> We are planning to implement new memcpy() of Miao,
>> and the important point is not removing previous memcpy()
>> for future architectures and benchmarkings.
>>
>> I feel that adding new CPU feature flag (like X86_FEATURE_REP_GOOD)
>> and switching memcpy() with alternative mechanism is good way.
>> (So we will have three memcpy()s: rep based, unrolled, and new
>> unaligned oriented one)
>> But there is another way: #ifdef. Which do you prefer?
>
> I agree with your idea, but Ma Ling said this way may cause the i-cache
> miss problem.
> Âhttp://marc.info/?l=linux-kernel&m=128746120107953&w=2
> (The size of the i-cache is 32K, the size of memcpy() in my patch is
> 560Byte,
> and the size of the last version in tip tree is 400Byte).
>
> But I have not tested it, so I don't know the real result. Maybe we should
> try to implement the new memcpy() first.

I compared memcpy()'s icache miss behaviour with my new
--wait-on patch ( https://patchwork.kernel.org/patch/408801/ ).
And the result is,

default of tip tree

% sudo ./perf stat -w /tmp/perf-stat-wait -e L1-icache-load-misses

Performance counter stats for process id '12559':

64,328 L1-icache-load-misses

0.106513157 seconds time elapsed

Miao Xie's memcpy()

% sudo ./perf stat -w /tmp/perf-stat-wait -e L1-icache-misses

Performance counter stats for process id '13159':

64,559 L1-icache-load-misses

0.107057925 seconds time elapsed

It seems that there is no fatal icache miss.
# I tested perf bench mem memcpy with Core i3 M 330 processor.

But I don't understand well about cache characteristics of intel processor.
I have to look at this problem more deeply.

>
>> And could you tell me the detail of CPU family information
>> you are targeting, Miao?
>
> They are ÂCore2 Duo E7300(Core name: Wolfdale) and Xeon X5260(Core name:
> Wolfdale-DP).
>
> The following is the detailed information of these two CPU:
> Core2 Duo E7300:
> vendor_id    : GenuineIntel
> cpu family   Â: 6
> model      : 23
> model name   Â: Intel(R) Core(TM)2 Duo CPU   E7300 Â@ 2.66GHz
> stepping    Â: 6
> cpu MHz     : 1603.000
> cache size   Â: 3072 KB
> physical id   : 0
> siblings    Â: 2
> core id     : 1
> cpu cores    : 2
> apicid     Â: 1
> initial apicid Â: 1
> fpu       : yes
> fpu_exception  : yes
> cpuid level   : 10
> wp       Â: yes
> flags      : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm
> constant_tsc arch_perfmon pebs bts rep_good aperfmperf pni dtes64 monitor
> ds_cpl est tm2 ssse3 cx16 xtpr pdcm sse4_1 lahf_lm dts
> bogomips    Â: 5319.70
> clflush size  Â: 64
> cache_alignment : 64
> address sizes  : 36 bits physical, 48 bits virtual
> power management:
>
> Xeon X5260:
> vendor_id    : GenuineIntel
> cpu family   Â: 6
> model      : 23
> model name   Â: Intel(R) Xeon(R) CPU      X5260 Â@ 3.33GHz
> stepping    Â: 6
> cpu MHz     : 1999.000
> cache size   Â: 6144 KB
> physical id   : 3
> siblings    Â: 2
> core id     : 1
> cpu cores    : 2
> apicid     Â: 7
> initial apicid Â: 7
> fpu       : yes
> fpu_exception  : yes
> cpuid level   : 10
> wp       Â: yes
> flags      : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall lm
> constant_tsc arch_perfmon pebs bts rep_good aperfmperf pni dtes64 monitor
> ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 lahf_lm dts tpr_shadow
> vnmi flexpriority
> bogomips    Â: 6649.07
> clflush size  Â: 64
> cache_alignment : 64
> address sizes  : 38 bits physical, 48 bits virtual
> power management:
>

Thanks for your information!

Thanks,
Hitoshi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/