Re: [PATCH] Use x86 SSE instructions for clear_page, copy_page

From: Jens Maurer
Date: Sun Aug 22 2004 - 15:51:45 EST



David S. Miller wrote:
Time kernel builds on an otherwise totally idle machine.
Do multiple runs so that the cache gets loaded and you're
testing the memory accesses rather than disk reads.

I've applied Denis Vlasenko's patch that allows runtime
switching between "rep stosl" and SSE instructions.

I'm using this script to perform the kernel build:

rm mm/*.o kernel/*.o
make SUBDIRS="mm kernel"

Hardware: Pentium III 850 MHz with 256 MB RAM

Results of "time" using "rep stosl":

real 1m19.038s 1m18.099s 1m18.612s 1m18.630s
user 1m07.258s 1m07.056s 1m07.198s 1m07.052s
sys 0m10.062s 0m10.205s 0m10.141s 0m10.209s

oprofile result for "rep stosl" (obtained on a separate run):
samples % app name symbol name
497537 71.9851 cc1 (no symbols)
37358 5.4051 vmlinux-2.6-sse zero_page_std
6933 1.0031 vmlinux-2.6-sse __copy_to_user_ll
6010 0.8695 libc-2.3.3.so _int_malloc
5940 0.8594 vmlinux-2.6-sse copy_page_std
5262 0.7613 vmlinux-2.6-sse mark_offset_tsc


Results using SSE instructions (writes do not use
the caches):

real 1m16.724s 1m16.281s 1m16.681s 1m16.681s 1m16.207s
user 1m07.938s 1m07.752s 1m07.905s 1m07.916s 1m07.846s
sys 0m07.517s 0m07.866s 0m07.715s 0m07.753s 0m07.684s

oprofile result for SSE (obtained on a separate run):
501243 74.5966 cc1 (no symbols)
21533 3.2046 vmlinux-2.6-sse zero_page_sse
8166 1.2153 vmlinux-2.6-sse __copy_to_user_ll
5797 0.8627 libc-2.3.3.so _int_malloc
5344 0.7953 vmlinux-2.6-sse copy_page_sse
4757 0.7080 libc-2.3.3.so __GI_memset
4490 0.6682 vmlinux-2.6-sse mark_offset_tsc


Interpretation:
Real time is about 2 sec lower, user CPU is about 0.7-0.9 sec
higher, system CPU is about 2.3 sec lower.
Results from test runs fluctuate by about 0.2 sec, thus the
measured differences appear to be significant, and in favor
of using SSE instructions for clear_page(). copy_page()
was not used enough (compared to clear_page) to give a clear
picture.

It looks like a future patch should allow for independent
"rep stosl"/SSE configuration for clear_page() and copy_page().

Jens Maurer



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/