Re: Random reboots

From: linux-os (Dick Johnson)
Date: Wed Feb 15 2006 - 10:09:41 EST

On Wed, 15 Feb 2006, Ryan Richter wrote:

> On Tue, Feb 14, 2006 at 11:22:22PM +0100, Jean Delvare wrote:
>> You seem to have hardware monitoring drivers loaded on the system, so
>> I'd suggest that you watch the returned values over time. If the
>> hardware is going wrong it might show there. Your system could be
>> overheating for some reason (stuck fan...)
>> The fact that older kernels were seemingly working better doesn't prove
>> much. You were running these kernels before, not now, and hardware
>> *does* age, contrary to what people seem to think. If you want to make
>> certain that older kernels were indeed working better for purely
>> software reasons, you should switch back to such an old kernel and see
>> if things actually improve or not.
>> Note that the first case ("a kernel came out that fixed the problem")
>> doesn't mean that the hardware was not at fault. There are quite a few
>> quirks in the Linux kernel code which are just there to workaround
>> known hardware or BIOS bugs.
> No, the old kernels still have all the bugs they ever did (of course).
> I tested it during the st-iommu-doublefree debugging. I do not plan on
> running the old kernel again, mainly because it has so many irritating
> bugs (df doesn't work, the serial console stalls on boot, so it won't
> boot without handholding, etc. etc.). I'd have to run it for at least a
> month to verify, and the old kernel has security vulnerabilities and so
> on.
> The sensors report a bunch of obvious nonsesne as always... I keep them
^^^^^^^^^^^^^^^^^^^_________ Hint?

I have a "new" machine with a "Thunder" board. It started to re-boot
for no good reason at all. It turns out that the plastic catch in
the fan/heatsink hold-down mechanism broke so the heatsink was
not tight against the CPU. I "fixed" it by tying it down with
some wire. The reboot problems, and some other "strange" problems
went away. One of the strange problems was that my 'C' runtime
library got corrupted, as well as some other read-only files,
even though e3fsck never found any problems.

> configured in with the hope that one day they'll report useful
> information, but that day hasn't come yet. I just checked, and all the
> fans are still fine. It's in a huge case with lots of fans and it's
> hardly warmer than room temp. The opteron 240s don't put out much heat.

It's the CPU that counts, not the air temperature. Check its hardware.

> I'm still thououghly convinced it's a kernel bug.
> -ryan
> -

Dick Johnson
Penguin : Linux version on an i686 machine (5589.66 BogoMips).
Warning : 98.36% of all statistics are fiction.

The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors@xxxxxxxxxxxx - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at