Re: [RFC][PATCH 0/5] arch: atomic rework

From: Linus Torvalds
Date: Fri Feb 21 2014 - 14:41:25 EST


On Fri, Feb 21, 2014 at 11:16 AM, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> Why would this be any different, especially since it's easy to
> understand both for a human and a compiler?

Btw, the actual data path may actually be semantically meaningful even
at a processor level.

For example, let's look at that gcc bugzilla that got mentioned
earlier, and let's assume that gcc is fixed to follow the "arithmetic
is always meaningful, even if it is only syntactic" the letter.
So we have that gcc bugzilla use-case:

flag ? *(q + flag - flag) : 0;

and let's say that the fixed compiler now generates the code with the
data dependency that is actually suggested in that bugzilla entry:

and w2, w2, #0
ldr w0, [x1, w2]

ie the CPU actually sees that address data dependency. Now everything
is fine, right?

Wrong.

It is actually quite possible that the CPU sees the "and with zero"
and *breaks the dependencies on the incoming value*.

Modern CPU's literally do things like that. Seriously. Maybe not that
particular one, but you'll sometimes find that the CPU - int he
instruction decoding phase (ie very early in the pipeline) notices
certain patterns that generate constants, and actually drop the data
dependency on the "incoming" registers.

On x86, generating zero using "xor" on the register with itself is one
such known sequence.

Can you guarantee that powerpc doesn't do the same for "and r,r,#0"?
Or what if the compiler generated the much more obvious

sub w2,w2,w2

for that "+flag-flag"? Are you really 100% sure that the CPU won't
notice that that is just a way to generate a zero, and doesn't depend
on the incoming values?

Because I'm not. I know CPU designers that do exactly this.

So I would actually and seriously argue that the whole C standard
attempt to use a syntactic data dependency as a determination of
whether two things are serialized is wrong, and that you actually
*want* to have the compiler optimize away false data dependencies.

Because people playing tricks with "+flag-flag" and thinking that that
somehow generates a data dependency - that's *wrong*. It's not just
the compiler that decides "that's obviously nonsense, I'll optimize it
away". The CPU itself can do it.

So my "actual semantic dependency" model is seriously more likely to
be *correct*. Not just t a compiler level.

Btw, any tricks like that, I would also take a second look at the
assembler and the linker. Many assemblers do some trivial
optimizations too. Are you sure that "and w2, w2, #0" really ends
up being encoded as an "and"? Maybe the assembler says "I can do that
as a "mov w2,#0" instead? Who knows? Even power and ARM have their
variable-sized encodings (there are some "compressed executable"
embedded power processors, and there is obviously Thumb2, and many
assemblers end up trying to use equivalent "small" instructions..

So the whole "fake data dependency" thing is just dangerous on so many levels.

MUCH more dangerous than my "actual real dependency" model.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/