On Tue, Feb 03, 2015 at 12:39:27PM +0100, LEROY Christophe wrote:The blelr is just there to protect the function against negative value of r4 hence ctr.
Signed-off-by: Christophe Leroy <christophe.leroy@xxxxxx>The blelr is pointless since len is guaranteed to be >= 5 (assuming that
---
arch/powerpc/lib/checksum_32.S | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/lib/checksum_32.S b/arch/powerpc/lib/checksum_32.S
index 6d67e05..5500704 100644
--- a/arch/powerpc/lib/checksum_32.S
+++ b/arch/powerpc/lib/checksum_32.S
@@ -26,13 +26,17 @@
_GLOBAL(ip_fast_csum)
lwz r0,0(r3)
lwzu r5,4(r3)
- addic. r4,r4,-2
+ addic. r4,r4,-4
addc r0,r0,r5
mtctr r4
blelr-
-1: lwzu r4,4(r3)
- adde r0,r0,r4
+ lwzu r5,4(r3)
+ lwzu r4,4(r3)
comment is accurate), but now it's both pointless and in the wrong place,
since you haven't yet finished the four words that you subtracted from
r4.
We can't just do blelr, we would need to fold the result first.
How about keeping the blelr, without the -, moving it after the initial
words, and changing the number of inital words to 5?
Also maybe do allok
the loads up front, since many PPC chips have a three cycle load latency
rather than two.