>I am not quite so sure - I must admit though I'm most familiar with
>the Intel assembler right now (least anyone care for some Univac 1106
>or PDP/11 assembler code :-) - I never made it to the VAX assembler -
>a while ago I downloaded a DEC note on AXP instruction scheduling and
>recall one example (if I remember it right) where a gratuitous NOP
>thrown in somewhere was supposed to do wonders.
The 21164 has 4 pipelines (2 int, 2 fp), all 4 stall if there
is anything unresolved, i.e. waiting on cache, ram. The 21164 has
3 kinds of noops to help the static scheduling of instructions
so that you stall all 4 pipelines less.
>I also found that the P6 is a lot less sensitive to gcc's follies
>than the P5 - if you look at gcc Intel assembler code produced
>from 'naive' high level language code, it makes you want to have
>a post-processor doing instruction scheduling.
Yes the p6 is pretty good at making good use of any inherent parallism
in the code via out of order execution, non-blocking cache/ram, register
renaming to help with the register issue etc.
>Now out-of-order
>execution, register renaming and similar things on modern CPUs may
>take care of that to some extent
The 21164 doesn't have any of these. Question is do other cpu's
that do, manage a better job of finding parallism in the code with silicon
in 5-10 ns, then a good compiler with the 21164. Granted one is dynamic
and the later is static.
> (admittedly the x86 is horrendously
>short on registers - but I think it was designed when stacks were
>the latest rage - and the 'enter' and 'leave' instructions point in
>the same direction - trying to assist high-level languages).
The 8087 was designed with a stack implementation, which was reasonable
at the time, but there was a bug in it's implementation which keeps people
from efficiently using it. I forget the exact details, but there is a good
chapter on it in Hennesy and Pattersons Qualitative Approach to Architecture
volume 2 (or something pretty close.) Basically 8087 was broke, and everything
since then has been compatible. I believe it's something along the lines
of not being able to tell if the stack is full.
Probably the main reason why such an aggressive chip as the p6 gets about
1/2 the fpu of the 21164's, pa8k, mips 10k's, ultrasparcs.
-- Bill Broadley Broadley@math.ucdavis.edu UCD Math Sys-Admin Linux is great. http://ucdmath.ucdavis.edu/~broadley PGP-ok