On Tue, 21 Jul 2009, Linus Torvalds wrote:
[ Added Ian Lance Taylor to the cc, he knows the background, and unlike me
is competent with gcc. ]
On Tue, 21 Jul 2009, Troy Moure wrote:
I think I've found something interesting. Look at the the code generated
for edid_checksum() in driver/video/fbmon.c. This is what I see for the
-fno-strict-overflow kernel:
Ooh.
Bingo. You're 100% right, and you definitely found it (of course, there
may be _other_ cases like this, but that's certainly _one_ of the
problems, and probably the only one).
Just out of curiosity, how did you find it? Now that I know where to look,
it's very obvious in the assembler diffs, but I didn't notice it until you
pointed it out just because there is so _much_ of the diffs...
And yes, that's very much a compiler bug. And I also bet it's very easily
fixed.
The code in question is this loop:
#define EDID_LENGTH 128
unsigned char i, ...
for (i = 0; i < EDID_LENGTH; i++) {
csum += edid[i];
all_null |= edid[i];
}
and gcc -fno-strict-overflow has apparently decided that that is an
infinite loop, even though it clearly is not. So then the stupid and buggy
compiler will compile that loop (and the whole rest of the function) to
the "optimized" version that is just
loop:
jmp loop;
I even bet I know why: it looks at "unsigned char", and sees that it is an
8-bit variable, and then it looks at "i < EDID_LENGTH" and sees that it is
a _signed_ comparison (it's signed because the C type rules mean that
'unsigned char' will be extended to 'int' in an expression), and then it
decides that in a signed comparison an 8-bit entry is always going to be
smaller than 128.
Anyway, I bet we can work around the compiler bug by just changing the
type of "i" from "unsigned char" to be a plain "int".
Krzysztof? Mind testing that?
Ian? This is Linux 2.6.27.27 compiled with gcc-4.2.4. I'm not seeing the
bug in the gcc I have on my machine (gcc-4.4.0), but the bug is very clear
(once you _find_ it, which was the problem) in the binaries that Krzysztof
posted. They're still at:
http://noc.axelspringer.pl/no-strict-overflow-vs-wrapv/vmlinux-fno-strict-overflow.bz2 (Hangs)
http://noc.axelspringer.pl/no-strict-overflow-vs-wrapv/vmlinux-fwrapv.bz2 (OK)
http://noc.axelspringer.pl/no-strict-overflow-vs-wrapv/vmlinux-fnone.bz2 (OK)
and you can clearly see the 'edid_checksum' miscompilation in the objdump
disassembly.
BTW: here is a simple testcase for this bug:
--- fno-strict-overflow-fixed-bug.c ---
#include <stdio.h>
int main() {
unsigned char i;
for (i = 0; i < 128; i++)
printf("loop %u\n", i);
return 0;
}
--- cut here ---
The code should be compiled with:
cc -o fno-strict-overflow-fixed-bug -Os -fno-strict-overflow fno-strict-overflow-fixed-bug.c
or:
cc -o fno-strict-overflow-fixed-bug -O2 -fno-strict-overflow fno-strict-overflow-fixed-bug.c
This bug does not exist with -O1 or if the loop is controlled by "i < 127" or "i < 129".
So, we should make sure there is no
unsigned char i; (...) for (i = 0; i < 128; i++)
somewhere inside the kernel.