Jim> He also did a little loop unrolling in test1 which made a
Jim> tremendous performance difference:
roadrunner> ./newtest1 10
Jim> 1000 10 4.20 190.48
roadrunner> ./test1 10
Jim> 1000 10 0.58 137.14
The numbers Richard got on his 300MHz (?) eb164 for this new version
of test1 are interesting, too (Richard, I hope you don't mind me
quoting your results):
ecoff (gcc version 2.7.1 off the red hat 3.0.3 cd)
1000 10 3.65 219.18
elf shared (gcc version 2.7.2 snapshot 960602)
1000 10 3.76 212.78
LD_BIND_NOW=1
1000 10 3.65 219.04
elf static (same)
1000 10 3.41 234.73
I'm not sure yet we understand why the "elf static" binary is that
much faster, but ELF performance isn't really something anybody has
had time yet to look into, so it may be that we just get lucky.
Now where does the improvement come from? Well, gcc either is not
agressive enough about using registers (of which the Alpha has plenty)
or its alias analysis is extremely conservative. The upshot of this
is that most loops in the tests I looked at are scheduled _exactly_
the way they are written in the sources---very often, a load is
immediately followed by a use. I looked at one function each in test1
and test3 and in both cases, just separating the loads from the stores
(and putting the actual operations in between) improved performance
tremendously. In test3, this reduced execution time of function
svmva() from 1.07 seconds down to 0.56 seconds on my Cabriolet (275MHz
21064a, the times where measured with gprof---so they should not be
taken literally, but the relative performance is usually accurate).
The overall improvement for test3 was less because I didn't have the
time to make the same kind of changes to the other functions.
To summarize:
(1) When using the DEC C compilers, be sure it's operating in ANSI C
mode (-std1)
(2) When using gcc, it may be necessary to spend some time optimizing
the code for the Alpha to get similar (or better) performance than
what DEC C achieves (cc -migrate -O4 -std1 or some such).
(3) gcc -O -Wall is a great assistant in tracking down 64-bit problems
in code.
So, please, do _not_ jump out of the window! (Or is that "into
window(s), these days?? ;-))
--david