Re: x86 memcpy performance

From: Andrew Lutomirski
Date: Mon Aug 15 2011 - 16:08:42 EST

On Mon, Aug 15, 2011 at 4:05 PM, Borislav Petkov <bp@xxxxxxxxx> wrote:
> On Mon, Aug 15, 2011 at 03:11:40PM -0400, Andrew Lutomirski wrote:
>> > Well, copy_from_user... does a bunch of rep; movsq - if the SSE version
>> > shows reasonable speedup there, we might need to make those work too.
>> I'm a little surprised that SSE beats fast string operations, but I
>> guess benchmarking always wins.
> If by fast string operations you mean X86_FEATURE_ERMS, then that's
> Intel-only and that actually would need to be benchmarked separately.
> Currently, I see speedup for large(r) buffers only vs rep; movsq. But I
> dunno about rep; movsb's enhanced rep string tricks Intel does.

I meant X86_FEATURE_REP_GOOD. (That may also be Intel-only, but it
sounds like rep;movsq might move whole cachelines on cpus at least a
few generations back.) I don't know if any ERMS cpus exist yet.
