Re: Semaphore assembly-code bug

From: Linus Torvalds
Date: Fri Oct 29 2004 - 15:47:51 EST




On Fri, 29 Oct 2004, Andreas Steinmetz wrote:
>
> If you still believe in features I can't find any manufacturer
> documentation for, well, you're Linus so it's your decision.

It's not that I'm Linus. It's that I am apparently better informed than
you are, and the numbers you are looking at are irrelevant. For example,
have you even _looked_ at the Pentium M stack engine documentation, which
is what this whole argument is all about?

And the documentation you look at is not revelant. For example, when you
look at the latency of "pop", who _cares_? That's the latency to use the
data, and has no meaning, since in this case we don't actually ever use
it. So what matters is other things entirely, like how well the
instructions can run in parallell.

Try it.

popl %eax
popl %ecx

should one cycle on a Pentium. I pretty much _guarantee_ that

lea 4(%esp),%esp
popl %ecx

takes longer, since they have a data dependency on %esp that is hard to
break (the P4 trace-cache _may_ be able to break it, but the only CPU that
I think is likely to break it is actually the Transmeta CPU's, which did
that kind of thing by default and _will_ parallelise the two, and even
combine the stack offsetting into one single micro-op).

So my argument is that "popl" is smaller, and I doubt you can find a
machine where it's actually slower (most will take two cycles). And I am
pretty confident that I can find machines where it is faster (ie regular
Pentium).

Not that any of this matters, since there's a patch that makes all of this
moot. If it works.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/