RE: [PATCH 7/9] x86/lib/memcpy_64.S: Optimize memcpy by enhancedREP MOVSB/STOSB
From: Yu, Fenghua
Date: Wed May 18 2011 - 15:06:34 EST
> -----Original Message-----
> From: Ingo Molnar [mailto:mingo@xxxxxxx]
> Sent: Tuesday, May 17, 2011 11:36 PM
> To: Yu, Fenghua
> Cc: Thomas Gleixner; H Peter Anvin; Mallick, Asit K; Linus Torvalds;
> Avi Kivity; Arjan van de Ven; Andrew Morton; Andi Kleen; linux-kernel
> Subject: Re: [PATCH 7/9] x86/lib/memcpy_64.S: Optimize memcpy by
> enhanced REP MOVSB/STOSB
> * Fenghua Yu <fenghua.yu@xxxxxxxxx> wrote:
> > From: Fenghua Yu <fenghua.yu@xxxxxxxxx>
> > Support memcpy() with enhanced rep movsb. On processors supporting
> > rep movsb, the alternative memcpy() function using enhanced rep movsb
> > overrides the original function and the fast string function.
> > Signed-off-by: Fenghua Yu <fenghua.yu@xxxxxxxxx>
> > ---
> > arch/x86/lib/memcpy_64.S | 45 ++++++++++++++++++++++++++++++++----
> > 1 files changed, 32 insertions(+), 13 deletions(-)
> > ENDPROC(__memcpy)
> > /*
> > - * Some CPUs run faster using the string copy instructions.
> > - * It is also a lot simpler. Use this when possible:
> > - */
> > -
> > - .section .altinstructions, "a"
> > - .align 8
> > - .quad memcpy
> > - .quad .Lmemcpy_c
> > - .word X86_FEATURE_REP_GOOD
> > -
> > - /*
> > + * Some CPUs are adding enhanced REP MOVSB/STOSB feature
> > + * If the feature is supported, memcpy_c_e() is the first choice.
> > + * If enhanced rep movsb copy is not available, use fast string
> > + * memcpy_c() when possible. This is faster and code is simpler
> > + * original memcpy().
> Please use more obvious names than cryptic and meaningless _c and _c_e
> postfixes. We do not repeat these many times.
> Also, did you know about the 'perf bench mem memcpy' tool prototype we
> have in
> the kernel tree? It is intended to check and evaluate exactly the
> patches you
> are offering here. The code lives in:
> Please look into testing (fixing if needed), using and extending it:
> - We want to measure the alternatives variants as well, not just the
> generic one
> - We want to measure memmove, memclear, etc. operations as well, not
> - We want cache-cold and cache-hot numbers as well, going along
> multiple sizes
> This tool can also useful when developing these changes: they can be
> tested in
> user-space and can be iterated very quickly, without having to build
> booting the kernel.
> We can commit any enhancements/fixes you do to perf bench alongside
> your memcpy
> patches. All in one, such measurements will make it much easier for us
> to apply
> the patches.
I'll work on the bench tool and will let you know when it's ready.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/