It does test for 32bit aligment - this routine can not be called
with buf&1, so it's enough to just check this bit.
Note that this check is required to optimize away the branches
at the end (the part you removed).
> You now should note that %esi maybe equal to 1 or 0x11 or 0x21 and you'll
> go ahead in the unrolled loop. That's not exactly the bug but now I think
no, this code can simply not be called with such a buffer.
> to see why you don't see the `andl' buggy. Yes, if %esi would be 4byte
> aligned then the andl wouldn't trigger an Oops or a segfault in userspace,
> even if was reading in overflow after the buffer boundary for some byte,
> because pages of memory are always 4byte aligned. But with the current
correct.
> To check if %esi is 4byte aligned you should do:
>
> testl $3, %esi
>
> instead, and then rewrite the slow path to 4byte align %esi (instead of
> computing the checksum for two bytes and go ahead as happens now).
the current code _is_ enough.
> >> And btw, I think %esi is going to be aligned.
> >
> >and you have verified this?
>
> The skb->data is going to be aligned because skb->data originally is a
> kmalloced area (kmalloc return regions aligned with the L1 cache that it's
> 32 or 16 byte in all x86), then skb->data it's advanced by skb_pull of
> 4byte aligned regions. I am not aware of an hardware protocol that uses an
> header not 32bit aligned though.
>
> Even if it could happen to have in an alien-case the checksum pool not
> 32bit aligned, that wouldn't be sure the standard case, and it would work
> _fine_ anyway, just with the cksum runing 44% slower.
Hmm, do you realize that this directly contradicts your other theory? ;)
[you can either have %esi 32bit aligned or not, but hardly both...]
But seriously, all data I have gathered so far indicates that the
buffer is always 32bit aligned. But until this is confirmed
to be guaranteed, it's too early to remove that check; esp. since
the gain is so microscopic.
> Did you ever seen my patch? The numbers are included in my email with the
> patch attacched.
Arrived while I was typing that msg.
> andl -128(%esi),%ebx # esi is 4-aligned so should be ok
> ^^^ esi won't be 4byte aligned for the
> point above
it will, see above
> To make you happy now I also reproduced in the kernel. Look this
> mini-kernel module:
> csum_partial1(buff+1, PAGE_SIZE-1, 0);
> Unable to handle kernel paging request at virtual address c281b000
> ^^^^^^^^ next page
Then don't do this :)
The routine is not supposed to be called like this. Throwing random, illegal
args at internal functions doesn't make much sense, does it?
> ------------ change topic to "having %esi 4byte aligned or not" ----------
I really am surprised that one can argue these two things in one msg,
interleaved :)
> If you would find a case where skb->data is not 4byte aligned then you
> would convince me to _write_ the code to make %esi 32bit aligned before
> entering the unrolled loop.
one of the reasons for that patch (in the iackk tarball) was to identify
such cases, but it hasn't found one. Still that doesn;t necessarily mean
they don't exist.
> And if something, we could consider to change csum_partial_copy_generic to
> try to align at least one between %esi and %edi if they are different and
> both not 32bit aligned.
the <686 csum_partial_copy_generic already does align the destination.
> Note also that %esi in the csum-copy case, is the pointer given from
> userspace. And userspace can also provide a not 32bit aligned region. But
> %edi is the skb that will just be 32bit aligned. And so you are just in
> the best condition you can have, since you can't 32bit align both %esi and
> %edi if they are different.
>
> Finally right now I am still very happy with my patch I posted in the list
> (I had not the need to change it (yet :)) and it still think it should be
> applyed to both 2.3.3 and 2.2.9 (for the record: I don't know if it will
> apply cleanly to 2.2.9 but once applyed to 2.3.3 the checksum.S of 2.3.3
> can be copyed with cp to the 2.2.9 tree to automagically fix also the %esi
> and %ebx clobbering that got fixed somewhere between 2.2.9 and 2.3.3).
2.3.3 (the 686 routines still are not used in 2.2, but there haven't been
a 2.2 release in the mean time; I assume Linus applied the patch to both
trees)
> Comments?
Now, with the alignment issue behind, how about concentrating
on improving the routines a bit more?
artur
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/