Re: [PATCH] futex: eliminate cache miss from futex_hash()

From: Sebastian Andrzej Siewior
Date: Mon Oct 26 2015 - 11:22:43 EST

On 09/12/2015 11:59 AM, Ingo Molnar wrote:
> * Davidlohr Bueso <dave@xxxxxxxxxxxx> wrote:
>> I think we should leave it as is.
> But ... given that these are shared-cached values (cached on all CPUs), this
> change would only be measurable in such a benchmark if the cache footprint of the
> test is just about to overflow the size of the CPU cache and the one extra cache
> line would cause cache trashing. That is very unlikely.
> So such a change seems to make sense unless you can argue that it's _bad_ to move
> them closer to each other.

hash_futex(), ARM, gcc-5.2.1:
- three opcodes less
- we don't push / pop a register to the stack

--- futex_old.o_f.S
+++ futex_new.o_f.S
@@ -1,26 +1,23 @@
00000000 <hash_futex>:
-push {lr} ; (str lr, [sp, #-4]!)
-movw r3, #48887 ; 0xbef7
ldr r1, [r0, #8]
-movt r3, #57005 ; 0xdead
+movw r3, #48887 ; 0xbef7
ldr r2, [r0, #4]
-movw ip, #0
+movt r3, #57005 ; 0xdead
add r3, r1, r3
ldr r0, [r0]
add r2, r3, r2
-movt ip, #0
+movw ip, #0
eor r1, r3, r2
add r3, r3, r0
sub r1, r1, r2, ror #18
-ldr ip, [ip]
+movt ip, #0
eor r3, r3, r1
-movw lr, #0
+ldr r0, [ip, #4]
sub r3, r3, r1, ror #21
-sub ip, ip, #1
+ldr ip, [ip]
eor r2, r2, r3
-movt lr, #0
+sub r0, r0, #1
sub r2, r2, r3, ror #7
-ldr r0, [lr]
eor r1, r1, r2
sub r1, r1, r2, ror #16
eor r3, r3, r1
@@ -29,6 +26,6 @@
sub r3, r2, r3, ror #18
eor r1, r1, r3
sub r3, r1, r3, ror #8
-and r3, r3, ip
-add r0, r0, r3, lsl #6
-pop {pc} ; (ldr pc, [sp], #4)
+and r0, r0, r3
+add r0, ip, r0, lsl #6
+bx lr

I guess that not invoking three opcodes is a good thing :)

> Thanks,
> Ingo

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at