on a ppro+ CLC is a 1 uop instruction, and as such "pairable". (check
with an intel manual if you don't believe me). Like i said, I've
tried exchanging the testl for a clc, and, in one case, it _was_ faster.
so i can believe it could be faster in andreas case too.
[...you made me curious so i tried his latest patch w/ testl/clc...]
Hmm, most of the time they come out equal, differences are in the noise
and depend on measurement method (for a single csum run timed with
rdtsc the results are almost always identical and rarely CLC wins,
for 10 runs testl wins by a narrow margin).
Anyway, while the clc vs testl issue may be interesting, it alone
doesn't make much difference in RL... (for comparison: even using
another register as testl arg makes more difference :)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/