Re: [PATCH] i386: Fix a couple busy loops inmach_wakecpu.h:wait_for_init_deassert()

From: Linus Torvalds
Date: Fri Aug 24 2007 - 13:41:53 EST




On Fri, 24 Aug 2007, Denys Vlasenko wrote:
>
> So you are ok with compiler propagating n1 to n2 here:
>
> n1 += atomic_read(x);
> other_variable++;
> n2 += atomic_read(x);
>
> without accessing x second time. What's the point? Any sane coder
> will say that explicitly anyway:

No.

This is a common mistake, and it's total crap.

Any "sane coder" will often use inline functions, macros, etc helpers to
do certain abstract things. Those things may contain "atomic_read()"
calls.

The biggest reason for compilers doing CSE is exactly the fact that many
opportunities for CSE simple *are*not*visible* on a source code level.

That is true of things like atomic_read() equally as to things like shared
offsets inside structure member accesses. No difference what-so-ever.

Yes, we have, traditionally, tried to make it *easy* for the compiler to
generate good code. So when we can, and when we look at performance for
some really hot path, we *will* write the source code so that the compiler
doesn't even have the option to screw it up, and that includes things like
doing CSE at a source code level so that we don't see the compiler
re-doing accesses unnecessarily.

And I'm not saying we shouldn't do that. But "performance" is not an
either-or kind of situation, and we should:

- spend the time at a source code level: make it reasonably easy for the
compiler to generate good code, and use the right algorithms at a
higher level (and order structures etc so that they have good cache
behaviour).

- .. *and* expect the compiler to handle the cases we didn't do by hand
pretty well anyway. In particular, quite often, abstraction levels at a
software level means that we give compilers "stupid" code, because some
function may have a certain high-level abstraction rule, but then on a
particular architecture it's actually a no-op, and the compiler should
get to "untangle" our stupid code and generate good end results.

- .. *and* expect the hardware to be sane and do a good job even when the
compiler didn't generate perfect code or there were unlucky cache miss
patterns etc.

and if we do all of that, we'll get good performance. But you really do
want all three levels. It's not enough to be good at any one level (or
even any two).

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/