>It does test for 32bit aligment - this routine can not be called
>with buf&1, so it's enough to just check this bit.
Where is enforced that such routine is never called with buf&1? Just to
produce an example: changing kmalloc internals will allow csum_partial to
break. Also changing the define of MAX_HEADER for a new protocol would
break the csum.
I instead assume that %esi is 32bit aligned _only_ to optimize the
routine. My csum will be slower if %esi is not aligned but it won't break.
Relying on the 16bit alignemnt of the buffer to avoid buffer overflows
seems really _not_ robust enough to me.
>Hmm, do you realize that this directly contradicts your other theory? ;)
That wasn't my own theory, but it's been due a mistake I did in the
benchmarking of %esi not 32bit aligned :-). But again I was not worried
about %esi not 32bit aligned because in the _common_ case %esi will _be_
32bit aligned.
>But seriously, all data I have gathered so far indicates that the
>buffer is always 32bit aligned. But until this is confirmed
>to be guaranteed, it's too early to remove that check; esp. since
>the gain is so microscopic.
The whole point is that such kind of routine should work fine without
depending on the alignment of the buffer. Would you also start hacking
memcpy to work only if the source is 32bit aligned? :-)
>one of the reasons for that patch (in the iackk tarball) was to identify
>such cases, but it hasn't found one. Still that doesn;t necessarily mean
Such cases should be indentified in the sources of the network code. But
even if they are not present now you can't catch new code that may be
developed in the future.
>the <686 csum_partial_copy_generic already does align the destination.
Where?? The only time %edi is changed is after every unrolled loop where
is incremented of 64 bytes.
>Now, with the alignment issue behind, how about concentrating
Do I convinced you that we shouldn't rely on the %esi 16bit alignement?
Really also enforcing %esi to be 32bit aligned before starting the
unrolled loop, and then doing the read-only-overflow-controlled (that in
such case wouldn't oops), looks a bit dirty to me, but I could agree with
such way to get some more bit of performances (to break that you should
change the alignment of pages in memory and not only develop some
different thing in the network side or change kmalloc, and considering
it's 686 specific stuff I guess it won't ever break :-).
Maybe you only think that I am too much paranoid ? :-)
Andrea Arcangeli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/