> On Tue, Sep 08, 1998 at 04:00:28PM -0500, Oliver Xymoron wrote:
> > Then
> > processing 8 characters can be (no idea what the opcodes are anymore)
> >
> > load *p to a (1 cycle)
> > extend a to b and c (I think this is two 1 cycle instructions)
> > add b to d (1 cycle)
> > add c to d (1 cycle)
> > increment p (1 cycle)
>
> Impressive hard work, but a Pentium can already do:
>
> movl p[0] to a (1/2 cycle)
> adcl b to s (1/2 cycle)
> movl p[1] to b (1/2 cycle)
> adcl a to s (1/2 cycle)
> etc.
>
> to checksum 4 bytes per cycle (roughly), as the code in
> arch/i386/lib/checksum.c does. Am I wrong?
Probably.. on a Pentium, the data bus is 64 bits wide and the only
instructions that do 64 bit fetches to registers are the FPU and MMX
instructions. This is why the FPU memcpy ran about twice as fast as the
standard one. The reason to use MMX for checksum is not to take advantage
of the speed of operations but to take advantage of the extra bus
bandwidth.
> The really interesting case is copy-and-checksum. How fast can you do
> that?
Add a store and increment to the above. However, as HPA points out, IP
checksums are not 2's complements so the above alogorithm is broken
anyway.
-- "Love the dolphins," she advised him. "Write by W.A.S.T.E.."
- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/faq.html