Re: [patch 00/19] mutex subsystem, -V11

From: David Howells
Date: Thu Jan 05 2006 - 08:45:58 EST


Arjan van de Ven <arjan@xxxxxxxxxxxxx> wrote:

> > Can you arrange .text.lock.mutex+0 here to just jump directly to
> > __mutex_lock_noinline? Otherwise we have an unnecessarily extended return
> > path.
>
> jmp is free on x86. eg zero cycles. Any trickery is more likely to cost
> because of doing unexpected things.

I'm talking about replacing:

<caller>: call
mutex_lock: lock decl
mutex_lock: js
.text.lock.mutex: call
__mutex_lock: ret
.text.lock.mutex: jmp
mutex_lock: ret
<caller>: ...

With:

<caller>: call
mutex_lock: lock decl
mutex_lock: js
.text.lock.mutex: jmp
__mutex_lock: ret
<caller>: ...

Or:

<caller>: call
mutex_lock: lock decl
mutex_lock: js
__mutex_lock: ret
<caller>: ...

This sort of thing is done by the compiler when it does tail-calling.

> > You may not want to make the JS go directly there, but you could have that
> > go to a JMP to __mutex_lock_noinline rather than having a CALL followed by
> > a JMP back to a return instruction.
>
> unbalanced call/ret pairs are REALLY expensive on x86. The current x86
> processors all do branch prediction on the ret based on a special
> internal call stack, breaking the symmetry is thus a branch prediction
> miss, eg 40+ cycles

In what way would this be unbalanced? You end up with one fewer CALL and one
fewer RET instruction. And why would the CPU think this is any different from
a function with a conditional branch in it? Eg:

<caller>: call
mutex_lock: lock decl
mutex_lock: js
mutex_lock: ret
<caller>: ...

The only route to __mutex_lock would be through mutex_lock...

Are there docs on this feature of the x86 anywhere?

David
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/