Re: [patch] entry.S asm improvement (removed some ugly jmp)

Andrea Arcangeli (andrea@e-mind.com)
Fri, 27 Nov 1998 20:27:55 +0100 (CET)


On Fri, 27 Nov 1998, Linus Torvalds wrote:

>On Fri, 27 Nov 1998, Andrea Arcangeli wrote:
>>
>> This my patch (from arca-33) should be obviously right and will improve
>> performance...
>
>Have you actually tested it? It breaks any branch prediction hardware that
>uses a return stack.

I think it' s faster because we are not doing:

--------------------------------------------------------------------
function:
...
ret

main:
...
pushl $after
jmp function
after:
...
--------------------------------------------------------------------

But we are doing:

--------------------------------------------------------------------
after:
....

function:
...
ret

main:
...
pushl $after
jmp function
/* never reach here */
--------------------------------------------------------------------

This .s asm proggy _should_ simulate 2.1.130:

-----------------------------------------------------------------------
.file "p.c"
.version "01.01"
gcc2_compiled.:
.text
.align 4
after:
jmp go
function:
ret
.globl main
.type main,@function
main:
pushl %ebp
xorl %eax,%eax
movl %esp,%ebp
.p2align 4,,7
.L5:
#APP
call function
jmp after
go:
#NO_APP
incl %eax
cmpl $49999000,%eax
jle .L5
movl %ebp,%esp
popl %ebp
ret
.Lfe1:
.size main,.Lfe1-main
.ident "GCC: (GNU) egcs-2.91.57 19980901 (egcs-1.1 release)"
-----------------------------------------------------------------------

This other proggy should simulate 2.1.130 + my patch:

-----------------------------------------------------------------------
.file "p.c"
.version "01.01"
gcc2_compiled.:
.text
.align 4
after:
jmp go
function:
ret
.globl main
.type main,@function
main:
pushl %ebp
xorl %eax,%eax
movl %esp,%ebp
.p2align 4,,7
.L5:
#APP
pushl $after
jmp function
go:
#NO_APP
incl %eax
cmpl $49999000,%eax
jle .L5
movl %ebp,%esp
popl %ebp
ret
.Lfe1:
.size main,.Lfe1-main
.ident "GCC: (GNU) egcs-2.91.57 19980901 (egcs-1.1 release)"
-----------------------------------------------------------------------

The time of 2.1.130 simulation is 2.603s, the timings of 2.1.130 + my
patch simulation is 2.601s. This is true on P5MMX I don' t know on other
CPUs (any volunteers?). My patch produce also smaller code and looks like
more nice to me ;)

Andrea Arcangeli

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/