Re: x86 memcpy performance

From: melwyn lobo
Date: Mon Aug 15 2011 - 09:27:44 EST

Was on a vacation for last two days. Thanks for the good insights into
the issue.
Ingo, unfortunately the data we have is on a soon to be released
platform and strictly confidential at this stage.

Boris, thanks for the patch. On seeing your patch:
+void *__sse_memcpy(void *to, const void *from, size_t len)
+ unsigned long src = (unsigned long)from;
+ unsigned long dst = (unsigned long)to;
+ void *p = to;
+ int i;
+ if (in_interrupt())
+ return __memcpy(to, from, len)
So what is the reason we cannot use sse_memcpy in interrupt context.
(fpu registers not saved ? )
My question is still not answered. There are 3 versions of memcpy in kernel:

181 #if (__GNUC__ >= 4)
182 #define memcpy(t, f, n) __builtin_memcpy(t, f, n)
183 #else
184 #define memcpy(t, f, n) \
185 (__builtin_constant_p((n)) \
186 ? __constant_memcpy((t), (f), (n)) \
187 : __memcpy((t), (f), (n)))
188 #endif
189 #else
190 /*
191 * kmemcheck becomes very happy if we use the REP instructions
192 * because it means that we know both memory operands in advance.
193 */
194 #define memcpy(t, f, n) __memcpy((t), (f), (n))
195 #endif
I will ignore CONFIG_X86_USE_3DNOW (including mmx_memcpy() ) as this
is valid only for AMD and not for Atom Z5xx series.
This means __memcpy, __constant_memcpy, __builtin_memcpy .
I have a hunch by default we were using __builtin_memcpy. This is
because I see my GCC version >=4 and CONFIG_KMEMCHECK not defined.
Can someone confirm of these 3 which is used, with i386_defconfig.
Again with i386_defconfig which workloads provide the best results
with the default implementation.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at