Re: performance of 21164-300 vs P6-200

Bill Broadley (broadley@math.ucdavis.edu)
Mon, 10 Jun 1996 10:36:26 -0700 (PDT)


>
> The hardware was one of these PCI/ISA PC-style Alphas, with an EB164
> motherboard - the compiler flags that I explicitely specified were
> -O3 for the Alpha and -O for the P6 (my codes generally seem most
> happy with just -O on the P6) - the test that failed definitely does

I'd try -O on the Alpha, gcc has been known to be over exuberent with
inlines that causes alot of cache misses, and decreases performance.

I'd explore the optimization space on the alpha as much as you
did on the p6 to be fair.

> not compare equality of floats - it is a very straightforward loop -
> 1000 3D vectors rotated/translated 1000 times - it takes about 0.5
> seconds (the 1000 times can be multiplied by a scale to get more
> iterations) on a P5-90 - I aborted it on the Alpha after about a minute.
> I have run these tests on P5, P6, IBM RS6K, SGIs without any problems
> and any source code changes - the total source code size for tests 1
> to 3 is 675 lines C. I am kind of suspecting that the code generated
> by gcc (no f2c involved here) is plain wrong.

It's very possible, it's also possible that you assumed that a pointer
is the same size as an int. Or some other pointer=32 bits assumption.
I believe all the above unless they are running IRIX 6.X use 32 bit
pointers.

Not that it couldn't be gcc, but I'm sure the linux-alpha list would be
interested in any verified linux-alpha-gcc bug.

> I am mainly interested in hearing how a 300 MHz 21164 does on floating
> point specifically under Linux and with f2c and gcc. I would hope that
> DEC Fortran and C under OSF/1, else I would wonder why a CPU rated at
> 512 SPECfp92 does at most 10 to 25 % better than one rated at 283.

I prefer spec95:alpha 300 Mhz 21164 12.16 p6-200 6.75 .

I wasn't really suggesting buying an OSF machine, just that if you
ran your benchmarks under OSF you would know the potential for the
hardware, and could take a guess at how close linux will come in the future.

Another possibility is that your benchmark fits in the P6's L2 (256k), and
not the Alphas L2 (96k). What are the size of the arrays your playing with?

Might want to make sure they are much larger then either cache, to be fair,
unless your actual production codes are that big. I.e. if your real
applications fit in the p6's l2 and perform better and not the alpha then
the p6 is the best for you.

Another possible explanation is integer performance to calculate indices,
array offsets, counters, etc are the limiting factor. The p6-200
is faster at int then the alpha 21164-300.

In any case I can easily believe that the linux-alpha gcc,gas, libm etc
need significant tuning, and that gcc produces bad code, but there
are other explanations as well.

-- 
Bill Broadley           Broadley@math.ucdavis.edu           UCD Math Sys-Admin
Linux is great.         http://ucdmath.ucdavis.edu/~broadley            PGP-ok