it's faster for the aligned buffer, multiple-of-four len case, it's
slower when len&3, and it's a lot slower for the unaligned buffer case.
> About the non-32bit aligned %esi, it was _not_ needed here.
see my previous msg, for explanation.
(basically, you get an impressive 0.5% speed increase ;),
but take a significant hit (44%) in case the buffer isn't
properly aligned)
> And btw, I think %esi is going to be aligned.
and you have verified this?
> >added several branches when 'len' isn't a multiple of four.
>
> That's __the__ bugfix for the buffer overflow in csum_partial.
>
> You are avoiding the two branches by adding a plain buffer _overflow_, so
> please don't claim to go faster since your code works only by _luck_.
> Before ever make comparison with my code and your code, please make sure
> that you are comparing my code with good code, and not with buggy code as
> the old 686 csum_partial was. Otherwise I can't be interested on your
define 'buffer overflow' :)
i found this amusing, but (a) it's not my code, and (b) just because
you might not know what it does does not make that code 'buggy'.
> Sure seems faster here. Please read the numbers I posted in my emails with
> the patch.
haven't seen any such post yet.
> >What 'potential buffer overflow'?
>
> andl.
and? :)
> You obviously have _not_ yet seen the buffer overflow I spotted and then
> fixed with my patch. When you'll have seen it, then I suggest you to make
> sure that your experimental chksums are not buggy, and if they are buggy
> then fix it, and make sure to repeat the benchmarks. Thanks.
well, if you really expect anybody to look at this why don't you
show a case where the original code fails... Since it works, as you say,
"only by _luck_" this shouldn't be very hard :^)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/