> > clc (as all the other flag-manipulation instructions) is non-pairable.
>
> on a ppro+ CLC is a 1 uop instruction, and as such "pairable". (check
> with an intel manual if you don't believe me). [...]
check out the Intel Optimization Guide, 24281603.pdf, page 121:
CLC - Clear Carry Flag
NP
NP - not pairable, executes in U-pipe.
> [...] Like i said, I've
> tried exchanging the testl for a clc, and, in one case, it _was_ faster.
> so i can believe it could be faster in andreas case too.
i can clearly demonstrate with code that the testl thing pairs nicely
while clc doesnt - and Intel docs agree with me.
> [...you made me curious so i tried his latest patch w/ testl/clc...]
> Hmm, most of the time they come out equal, differences are in the noise
> and depend on measurement method (for a single csum run timed with
> rdtsc the results are almost always identical and rarely CLC wins,
> for 10 runs testl wins by a narrow margin).
unless you _really_ know how to measure pairing effects, be careful before
jumping to conclusions. It's very easy to mess up the measurement.
-- mingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/