Re: Semaphore assembly-code bug

From: Linus Torvalds
Date: Sat Oct 30 2004 - 20:44:15 EST




On Sun, 31 Oct 2004, Andi Kleen wrote:
>
> Using the long stack setup code was found to be a significant
> win when enough registers were saved (several percent in real benchmarks)
> on K8 gcc.

For _what_?

Real applications, or SpecInt?

The fact is, SpecInt is not very interesting, because it has almost _zero_
icache footprint, and it has generally big repeat-rates, and to make
matters worse, you are allowed (and everybody does) warm up the caches by
running before you actually do the benchmark run.

_None_ of these are realistic for real life workloads.

> It speed up all function calls considerably because it
> eliminates several stalls for each function entry/exit.

.. it shaves off a few cycles in the cached case, yes.

> The popls will all depend on each other because of their implicied
> reference to esp.

Which is only true on moderately stupid CPU's. Two pop's don't _really_
depend on each other in any real sense, and there are CPU's that will
happily dual-issue them, or at least not stall in between (ie the pop's
will happily keep the memory unit 100% busy).

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/