ia32 ip checksum optimizations

Artur Skawina (skawina@geocities.com)
Tue, 25 May 1999 16:18:12 +0200


Andrea Arcangeli wrote:
>
> It includes my latest asm 686 changes in the cksum routine

+ * Andrea Arcangeli, fixed a potential buffer overflow in the
+ * 686 csum_partial, and some improvement in the 686 code.

Ugh, your 'improvement' makes csum_partial upto ~2% slower...
You have removed the check for non-32bit aligned buffers [1], and
added several branches when 'len' isn't a multiple of four.
Have you actually benchmarked the code with your changes?...
What 'potential buffer overflow'?
[i haven't looked at your csum_partial_copy_generic() changes
in detail, the loop unrolling itself is likely a win, but
there are other things that I would look into first]

The tool I used to optimize the ia32 checksum routines should be
available at

http://www.geocities.com/SiliconValley/Heights/6494/sw/iackk.tgz [14k]

I was going to work on it a bit more, but since people are
posting patches that either break the routines completely or
make them slower...

It contains a collection of csum_partial() routines, a program
to measure their relative speed, and a patch to gather real life
usage statistics.

I would be interested in seeing how the routines do on
different CPUs (both intel, and any clones), so if you'd
would like to see a routine that does better on your cpu
download the tarball, read the README, and make a report.

And if you want to play with the existing routines, or make
a new one iackk makes it a lot easier and safer.

ATM I don't think the 686 code can be pushed much further than
csum_partial_686as1 does; still, unrolling csum_partial_586
looks promising, if you're only optimizing for speed, not size.
I guess the pentium classic/mmx code could do better.

artur

[1] this can actually be a _small_ win, but the potential loss
is much bigger, so it doesn't look like it's worth it. See
the csum_partial_686as1 and csum_partial_686as1a numbers.

IACKK 0.9.9 Artur Skawina <skawina@geocities.com>
TIME(N+S) TIME(1480) TIME(XXX) CHECKSUM FUNCTION ( rdtsc_overhead=1 null=0 )
104869 98839 99206 5277 csum_partial_cdumb16
29884 22555 24141 5277 csum_partial_std
20493 17733 18702 5277 csum_partial_686
20524 17718 18687 5277 csum_partial_686copy
20326 17693 18592 5277 csum_partial_686as1s
19893 17336 18325 5277 csum_partial_686as1
19934 17254 22263 5277 csum_partial_686as1a
20573 17432 18703 5277 csum_partial_586s
20438 17366 18348 5277 csum_partial_586
20556 17387 18553 5277 csum_partial_586e
22279 19386 20019 5277 csum_partial_586f
20345 17105 22092 5277 csum_partial_rjas
20604 17408 22663 5277 csum_partial_686aa
22394 17333 22458 5277 csum_partial_rj

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/