Re: IP Checksumming

Systemkennung Linux (linux@mailhost.uni-koblenz.de)
Fri, 22 Nov 1996 01:07:44 +0100 (MET)


> I.e. the fastest possible C code can process 16 bits in each iteration,
> if running on a 32-bit cpu, 32 bits on a 64-bit cpu.

No.

> If you have conditionally executed opcodes, or a conditional move, then
> you can gain back the full register width, i.e. you can do something
> like this efficiently:

You can also do this if you have some kind of "set conditional" operation.

> unsigned long data[];
> ...

> unsigned long sum, new_sum;
>
> for (i = sum = 0; i < len; i++) {
> new_sum = sum + data[i];
> if (new_sum < sum) /* Did we get carry/overflow from the add? */
> new_sum++;
> sum = new_sum;

Here you have a if and it is one one which by my guess branch prediction
like on a PentiumPro/R10000K etc. will perform pretty bad. Move conditional
is an operation which is not available on many CPUs, some have it only for
floating point yet again other implement it only in new versions of their
CPU architectures.

csum += data[i];
carry = (csum < data[i]);
csum += carry;

does not need branches or other pipeline killers at all on most CPUs.
In fact on MIPS this even makes really nice code which takes four
cycles per 4 bytes on 32 bit CPUs or four cycles per 8 bytes on R4000
(64 bit) CPUs, R5000/R10000 should be able to do this in 2 cycles.

No hardware carry flag or assembler needed, pretty fast and portable.

Ralf