Re: [Bug #14270] Cannot boot on a PIII Celeron

From: Michael Tokarev
Date: Sun Oct 04 2009 - 08:16:08 EST


Michael Tokarev wrote:
Cyrill Gorcunov wrote:
On 10/2/09, Michael Tokarev <mjt@xxxxxxxxxx> wrote:
Michael Tokarev wrote:
Cyrill Gorcunov wrote:
On 10/1/09, Rafael J. Wysocki <rjw@xxxxxxx> wrote:
This message has been generated automatically as a part of a report
of regressions introduced between 2.6.30 and 2.6.31.

The following bug entry is on the current list of known regressions
introduced between 2.6.30 and 2.6.31. Please verify if it still should
be listed and let me know (either way).
Michael has been asked to bisect it (if possible). I cant reproduce it
in kvm unfortunately.
Yes, and that's what I'll be trying to do shortly.
I had other issues to sort out and wasn't able to
get to it in few last days.

Also I've a few other suspects. For example, in this .31
config I changed from bzip to lzma compression - and that's
where (or near) kernel is rebooting.
And that was the problem. After switching from LZMA
to BZIP2 kernel boots again.

Dunno if it can be treated as a regression, but it's
definitely a bug.

thanks for tracking it down Michael!
Rafael, who is responsible for LZMA now?
Cc him please.

Please hold on for a while.

I switched to BZIP2, it booted fine. I switched back to LZMA -
and that one now boots too. Original bzImage, which were built
by the same compiler from the same source using the same
options reboots.

So um... I'm now trying to reproduce it ;)

I performed about 20 kernel recompiles, and finally have some "statistics".
The problem is almost reproduceable, in a sense that I was able to get 6
more cases behaving the same way (rebooting right at early boot on a cel).
And all 3 "non-working" cases were with ccache. Ie, about half out of ~25
compiles done with ccache, and 7 of the resulting kernels are buggy. No
single failure without ccache so far.

Maybe it's some stale .o file cached by ccache (and it indeed looks like
that) -- I didn't try to remove the cache yet (but my guess is that I
wont be able to reproduce the issue with clean cache anymore).

What puzzles me most is the "failure mode". The difference between the
two processors is minimal. Having a corrupt .o file and almost-working
kernel is almost impossible by its own. And hitting this difference with
a corrupt .o file is.. unbelievable.

So I'm declaring it's a false alarm for now, and closing the bug.

/mjt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/