Re: Semaphore assembly-code bug

From: dean gaudet
Date: Fri Oct 29 2004 - 20:18:13 EST


On Fri, 29 Oct 2004, Linus Torvalds wrote:

> On Fri, 29 Oct 2004, linux-os wrote:
> > > with the following:
> > >
> > > leal 4(%esp),%esp
> >
> > Probably so because I'm pretty certain that the 'pop' (a memory
> > access) is not going to be faster than a simple register operation.
>
> Bzzt, wrong answer.
>
> It's not "simple register operation". It's really about the fact that
> modern CPU's are smarter - yet dumber - then you think. They do things
> like speculate the value of %esp in order to avoid having to calculate it,
> and it's entirely possible that "pop" is much faster, simply because I
> guarantee you that a CPU will speculate %esp correctly across a "pop", but
> the same is not necessarily true for "lea %esp".
>
> Somebody should check what the Pentium M does. It might just notice that
> "lea 4(%esp),%esp" is the same as "add 4 to esp", but it's entirely
> possible that lea will confuse its stack engine logic and cause
> stack-related address generation stalls..

it's worse than that in general -- lea typically goes through the AGU
which has either less throughput or longer latency than the ALUs...
depending on which x86en. it's 4 cycles for a lea on p4, vs. 1 for a pop.
it's 2 cycles for a lea on k8 vs. 1 for a pop.

use pop.

-dean
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/