What about the ability of gcc to handle optimization of grouping
instruction for processor like Alpha 21164A, pentium, ultraSparc or
whatever chip which does't include a reordering stage before feeding
the pipeline?
GCC has a long way to go.