On Tue, 11 Feb 2003 13:37:07 EST, dank@suburbanjihad.net (nick black) said:
> firstly, to what domain of checksums does this comment apply? secondly,
> why is it true? it seems the PADDW family of instructions could work
> well here; is the slowdown a result of the kernel's need to muck with
> fpu state (from what i can tell, mmx uses the fp registers)?
(Note - second-hand info from somebody else who looked at MMX/SSE to optimize
an inner loop. Double-check with CPU documentation).
There's a big "urp" sound as the processor switches from FP to MMX mode and
back, which apparently takes a large number of cycles. You can to some extent
amortize this if you're switching once for a LONG loop (the analysis I saw was
with a million or so pixels on a screen image) - if you're switching in and out
for a 1500 byte packet (or even worse, a 100-byte packet) the impact may be
more noticable. You may wish to examine the SSE/SSE2 opcodes, which apparently
don't take this performance hit.
-- Valdis Kletnieks Computer Systems Senior Engineer Virginia Tech
This archive was generated by hypermail 2b29 : Sat Feb 15 2003 - 22:00:35 EST