Re: kernel + gcc 4.1 = several problems

From: D. Hazelton
Date: Tue Jan 02 2007 - 18:42:02 EST


On Tuesday 02 January 2007 18:24, you wrote:
> On Tue, Jan 02, 2007 at 05:06:14PM -0500, D. Hazelton wrote:
> > On Tuesday 02 January 2007 16:56, Alistair John Strachan wrote:
> > > On Tuesday 02 January 2007 21:10, Adrian Bunk wrote:
> > > [snip]
> > >
> > > > > > Comparing your report and [1], it seems that if these are the
> > > > > > same problem, it's not a hardware bug but a gcc or kernel bug.
> > > > >
> > > > > This bug specifically indicates some kind of miscompilation in a
> > > > > driver, causing boot time hangs. My problem is quite different, and
> > > > > more subtle. The crash happens in the same place every time, which
> > > > > does suggest determinism (even with various options toggled on and
> > > > > off, and a 300K smaller kernel image), but it takes 8-12 hours to
> > > > > manifest and only happens with GCC 4.1.1. ...
> > > >
> > > > Sorry if my point goes a bit away from your problem:
> > > >
> > > > My point is that we have several reported problems only visible
> > > > with gcc 4.1.
> > > >
> > > > Other bug reports are e.g. [2] and [3], but they are only present
> > > > with using gcc 4.1 _and_ using -Os.
> > >
> > > I find [2] most compelling, and I can confirm that I do have the same
> > > problem with or without optimisation for size. I don't use selinux nor
> > > has it ever been enabled.
> > >
> > > At any rate, I have absolute confirmation that it is GCC 4.1.1, because
> > > with GCC 3.4.6 the same kernel I reported booting three days ago is
> > > still cheerfully working. I regularly get uptimes of 60+ days on that
> > > machine, rebooting only for kernel upgrades. 2.6.19 seems to be no
> > > worse in this regard.
> > >
> > > Perhaps fortunately, the configs I've tried have consistently failed to
> > > shake the crash, so I have a semi-reproducible test case here on C3-2
> > > hardware if somebody wants to investigate the problem (though it still
> > > takes 6-12 hours).
> >
> > The GCC code generator appears to have been rewritten between 3.4.6 and
> > 4.1.1....
> >
> > I took a look at the dump he posted and there are some minor and some
> > massive differences between the code. In one case some of the code is
> > swapped, in another there is code in the 3.4.6 version that isn't in the
> > 4.1.1... Finally the 4.1.1 version of the function has what appears to be
> > function calls and these don't appear in the code generated by 3.4.6
>
> Differences are expected since we disable unit-at-a-time for gcc < 4
> and gcc development didn't stall between 3.4 and 4.1.

Okay. Thing is that these noted differences, aside from where 4.1.1 doesn't
generate an opcode that 3.4.6 does aren't all that fatal, IMHO. The fact that
there it does generate call's rather than jumps for local pointer moves
(IIRC - been a while since I looked at the dump of pipe_poll that he
provided) might be part of the problem

> > In other words - the code generation for 4.1.1 appears to be broken when
> > it comes to generating system code.
>
> Bug number for an either already open or created by you bug in the gcc
> Bugzilla for what you claim to be a bug in gcc?

None. I didn't file a report on this because I didn't find the big, just noted
a problem that appears to occur. In this case the call's generated seem to
wrap loops - something I've never heard of anyone doing. These *might* be
causing the off-by-one that is causing the function to re-enter in the middle
of an instruction.

Seeing this I'd guess that this follows for all system-level code generated by
4.1.1 and this is exactly what I was reporting. If you'd like I'll go dig up
the dumps he posted and post the two related segments side-by-side to give
you a better example what I'm referring to.

DRH
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/