Arjan van de Ven wrote:
>
> In article <3A82D325.9068BC07@colorfullife.com> you wrote:
> > fast_clear_page() uses mmx instructions for clearing a page, what about
> > using sse instructions?
> > sse instructions can store 128 bit in one instruction, mmx only 64 bit.
>
> the sse FP registers might be lossy.
I thought that too, thus I only implemented memset(,0,) with sse.
But then I found this document on Intel website:
http://developer.intel.com/software/idap/media/pdf/copy.pdf
Intel recommends using sse registers for generic memcopy - they can't be
lossy.
> On my athlon, the in-kernel mmx
> functions are memory-bound (eg > 1 Gbyte/sec throughput)
>
You are using an Athlon with SDRAM?
A Pentium 4 has a 3.2 GB memory bus and I saw a benchmark that compared
mmx and sse memmove, and sse was _much_ faster.
I've implemented a user space sse copy_page, and:
* mmx is the slowest version! (~12000 cpu ticks/page).
* movsd is slightly faster (~11850 cpu ticks/page). Probably due to the
special 'rep movsd' optimization in the Pentium III (cpu notices that
ecx is large and switches to a cache line copy mode).
* sse is the fastest version (~ 11500 cpu ticks/page)
Everything with cold caches.
> Userspace program for the athlon code:
>
> http://www.fenrus.demon.nl/athlon.c
>
I'll check it.
-- Manfred - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/
This archive was generated by hypermail 2b29 : Thu Feb 15 2001 - 21:00:12 EST