Re: [patch] checksum P6 asm buffer overflow fix + 686 improvements

Artur Skawina (skawina@geocities.com)
Wed, 26 May 1999 17:33:39 +0200


Ingo Molnar wrote:
>
> it, smaller chunks are rather common. The performance difference between
> 64-byte and 128-byte chunks is mostly only visible if everything is fully
> cached (like in your benchmark). In cold-cache situations it's invisible.
> The hot-cache benchmark is rather misleading, because according to that
> benchmark the fastest routine is that is unrolled to handle a whole MTU
> sized packet. Going from 64 bytes to 128 bytes causes 192 bytes more
> icache footprint, i dont think this is worth it.

Just to give some concrete numbers: A fully unrolled (upto 2048bytes)
csum_partial() does about 5% better than the current stock 686 code for
1480 sized lenghts. The size of the routine inreases from (iirc - i
measured this yesterday) 23x bytes to 23xx bytes...
[note that for short buffers, only a small part of the routine will
be executed though - so the icache footprint may not be much
bigger for these cases. still...]

> doing fast MMX TCP checksums is possible, even if the MMX engine doesnt
> have a carry logic, this is from a csum routine i wrote a year ago:

> demonstrates the method nicely), but i finally found that the FPU handling
> complexity is simply not worth it. More and more networking cards are
> doing IP checksumming anyway.

MMX is probably not worth is for the checksum alone, but for
the checksum&copy case could, maybe, be a win. Is your MMX code
available somewhere?

artur

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/