Re: [CFT] faster athlon/duron memory copy implementation

From: erich@uruk.org
Date: Thu Oct 24 2002 - 14:01:54 EST


Matthias Welk <matthias.welk@fokus.gmd.de> wrote:

> Running on an Athlon XP2000+, ASUS A7V333, 768MB DDR2100:

...[snip]...

> 1019 [maw] (buruk) /tmp/athlon # athlon_test
> Athlon test program $Id: fast.c,v 1.6 2000/09/23 09:05:45 arjan Exp $
>
> copy_page() tests
> copy_page function 'warm up run' took 18081 cycles per page
> copy_page function '2.4 non MMX' took 19487 cycles per page
> copy_page function '2.4 MMX fallback' took 19403 cycles per page
> copy_page function '2.4 MMX version' took 18086 cycles per page
> copy_page function 'faster_copy' took 11372 cycles per page
> copy_page function 'even_faster' took 11183 cycles per page
> copy_page function 'no_prefetch' took 7815 cycles per page
> 1020 [maw] (buruk) /tmp/athlon # athlon_test

Whoa! Hmm.

If I'm reading this right, with a processor speed of 1.666 GHz,
you're getting:

    (4096 bytes / 7815 clocks) * 1.666 GHz = 873 MB/sec

The perfect peak performance of your setup, if the cache implements
standard write-allocate behavior (the target cache line is read before it
is written because the write logic doesn't know you're going to overwrite
the whole line in cases like this), should be:

    MIN( Memory speed / FSB speed ) / 3 = 700 MB/sec

So what gives? Did I misinterpret the output of your program?
Is the test flawed?

--
    Erich Stefan Boleyn     <erich@uruk.org>     http://www.uruk.org/
"Reality is truly stranger than fiction; Probably why fiction is so popular"
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Oct 31 2002 - 22:00:24 EST