AW: what causes Machine Check exception?

From: Martin Bene (mb@sime.com)
Date: Tue Sep 26 2000 - 15:02:52 EST


Hi Albert,

> See the Signal 11 FAQ... you can start with:
>
> 1. make sure the memory in bank 4 is properly seated
> 2. make sure your case is cool enough inside
> 3. make sure your power supply is good
> 4. try new RAM
> 5. try a new motherboard

Memory: ECC Mem in all banks, status:

[root@delphi1 log]# cat /proc/ram
Chipset ECC capability : ECC detection and correction
Current ECC mode : ECC detection and correction
Bank Size Type ECC SBE MBE
0 128M SDR Y 0 0
1 128M SDR Y 0 0
2 128M SDR Y 0 0
3 128M SDR Y 0 0
Total 512M

So memory problems seem to be out. (kernel did not panic on machine check
exception, so machine is still up after reported exception)

Temperature / power:

[root@delphi1 log]# sensors
w83781d-isa-0290
Adapter: ISA adapter
Algorithm: ISA algorithm
VCore 1: +1.60 V (min = +1.56 V, max = +1.72 V)
VCore 2: +1.61 V (min = +1.56 V, max = +1.72 V)
+3.3V: +3.44 V (min = +3.13 V, max = +3.45 V)
+5V: +4.99 V (min = +4.72 V, max = +5.24 V)
+12V: +11.86 V (min = +11.36 V, max = +12.58 V)
-12V: -11.51 V (min = -11.13 V, max = -12.55 V)
-5V: -4.88 V (min = -4.74 V, max = -5.24 V)
CPU-Fan: 4623 RPM (min = 3000 RPM, div = 2)
MB temp: +34 C (limit = +60 C, hysteresis = +50 C)

Looks OK as well, no peaks / anomalys reported for the time of the error as
well. System is running with 1 case fan, 1 fan for the hd's and one for the
power supply.

Leaves faulty CPU or motherboard; I've got 4 identical systems, this is the
first strange message reported so far on any of them.

Just finished 30 minutes of compiling the kernel w/o further problems; let's
see what it says tomorrow morning.

> Then, after you find the problem, see if Oracle has a way to
> verify that nothing got corrupted. Something like fsck.oracle
> would be useful.

Doesn't look like anything got hurt; restore of last backup + transaction
logs on the backup server gives identical full database export as live
database.

> If you can't find a problem, maybe background radiation flipped
> a few bits.

Guess I'll have to take that as a possible explanation, there sure doesn't
seem to be anything else fishy going on.

Thanks for the info on meaning of the exception.

Bye, Martin

"you have moved your mouse, please reboot to make this change take effect"
--------------------------------------------------
 Martin Bene vox: +43-316-813824
 simon media fax: +43-316-813824-6
 Andreas-Hofer-Platz 9 e-mail: mb@sime.com
 8010 Graz, Austria
--------------------------------------------------
finger mb@mail.sime.com for PGP public key

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sat Sep 30 2000 - 21:00:18 EST