Re: [PATCH RFC] [X86] performance improvement for memcpy_64.S byfast string.

From: Ingo Molnar
Date: Fri Nov 13 2009 - 03:10:58 EST



* H. Peter Anvin <hpa@xxxxxxxxx> wrote:

> On 11/12/2009 11:33 PM, Ingo Molnar wrote:
> >
> > * Pavel Machek <pavel@xxxxxx> wrote:
> >
> >>> Ling, if you are interested, could you send a user-space test-app to
> >>> this thread that everyone could just compile and run on various older
> >>> boxes, to gather a performance profile of hand-coded versus string ops
> >>> performance?
> >>>
> >>> ( And i think we can make a judgement based on cache-hot performance
> >>> alone - if then the strings ops will perform comparatively better in
> >>> cache-cold scenarios, so the cache-hot numbers would be a conservative
> >>> estimate. )
> >>
> >> Ugh, really? I'd expect cache-cold performance to be not helped at all
> >> (memory bandwidth limit) and you'll get slow down from additional
> >> i-cache misses...
> >
> > That's my point - the new code is shorter, which will run comparatively
> > faster in a cache-cold environment.
> >
>
> memcpy_c by itself is by far the shortest variant, of course.

yep. The argument i made was when a long function was compared to a
short one. As you noted we dont actually enable the long function all
that often - which inverts the same argument.

> The question is if it makes sense to use the long variants for short
> (< 1024 bytes) copies.

I'd say not - the kernel executes in a icache-cold environment most of
the time (as user-space is far more cache intense in the majority of
workloads and kernel processing starts with a cold icache), so
optimizing the kernel for code size is very important. (but numbers done
on real workloads can convince me of the opposite.)

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/