Re: ia32 ip checksum optimizations

Artur Skawina (skawina@geocities.com)
Tue, 25 May 1999 22:20:20 +0200


Andrea Arcangeli wrote:
>
> On Tue, 25 May 1999, Artur Skawina wrote:
>
> >Ugh, your 'improvement' makes csum_partial upto ~2% slower...
> >You have removed the check for non-32bit aligned buffers [1], and
>
> Here after my patch the csum_partial performances are almost the same. If
> something my one it's a bit faster. Maybe you are running a different CPU?
> I use a PII Deschutes.

it's faster for the aligned buffer, multiple-of-four len case, it's
slower when len&3, and it's a lot slower for the unaligned buffer case.

> About the non-32bit aligned %esi, it was _not_ needed here.

see my previous msg, for explanation.

(basically, you get an impressive 0.5% speed increase ;),
but take a significant hit (44%) in case the buffer isn't
properly aligned)

> And btw, I think %esi is going to be aligned.

and you have verified this?

> >added several branches when 'len' isn't a multiple of four.
>
> That's __the__ bugfix for the buffer overflow in csum_partial.
>
> You are avoiding the two branches by adding a plain buffer _overflow_, so
> please don't claim to go faster since your code works only by _luck_.
> Before ever make comparison with my code and your code, please make sure
> that you are comparing my code with good code, and not with buggy code as
> the old 686 csum_partial was. Otherwise I can't be interested on your

define 'buffer overflow' :)

i found this amusing, but (a) it's not my code, and (b) just because
you might not know what it does does not make that code 'buggy'.

> Sure seems faster here. Please read the numbers I posted in my emails with
> the patch.

haven't seen any such post yet.

> >What 'potential buffer overflow'?
>
> andl.

and? :)

> You obviously have _not_ yet seen the buffer overflow I spotted and then
> fixed with my patch. When you'll have seen it, then I suggest you to make
> sure that your experimental chksums are not buggy, and if they are buggy
> then fix it, and make sure to repeat the benchmarks. Thanks.

well, if you really expect anybody to look at this why don't you
show a case where the original code fails... Since it works, as you say,
"only by _luck_" this shouldn't be very hard :^)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/