Hmmm, I think you've been reading the wrong timing documents. On the Pentium
and Pentium-II processors, a bswap is 1 cycle. If you don't have to do it, you
typically do two xchgb's, and a rorl. Because the xchgb's modify bytes within
the same register, it will actually take longer than one cycle (this is a no-no
under the p5/p6). Granted, on AMD processors the bswap instruction takes 4
cycles, so it is less desirable. As a data point, when I added the use of
bswap to swap bytes in the powerpc simulator, it did speed up the simulator by
a few percentage points. One possibily of using it in compiler generated code
is having an explicit endian attribute that allows you to declare a particular
endian orientation for a variable, which is useful in dealing with network
data and real hardware. Note, the endian attribute keeps coming up every six
months or so, and sooner or later we will fix the internal programs in GCC that
prevent us from considering it.
-- Michael Meissner, Cygnus Solutions (Massachusetts office) 4th floor, 955 Massachusetts Avenue, Cambridge, MA 02139, USA meissner@cygnus.com, 617-354-5416 (office), 617-354-7161 (fax)- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/