Ok, I had some time to do some more testing. I've used three different sets of
memory (4x32meg FPM Parity simms). All memory has been in use for over 5 months
under various other kernels with no problems (much of which under very heavy load).
All my systems are comprised of the following:
Tyan S1668, w/2 PPro (200mhz, 256k cache)
128mb ram (FPM, Parity)
Tyan BIOS v3.03, NO powersaving turned on in bios, ECC enabled
Buslogic BT-958 controller
Digital DE500AA ethernet card
Monilithic kernel
Using a stripped (RAID_0) disk comprised of 2 fast/wide seagate barracuda 4gigs and
testing between kernel v2.0.29 & 2.0.30. I did the following. Filled up the
drive using articles from my news server (about 1,500,000 files). Rebooted under
v2.0.29 w/ ECC enabled and ran several fsck's, across disk, no problems. Ran several
badblocks -w's across disk, also no problems. Recopied all files over to new volume
and ran another fsck. (did this routine 5 times, no errors).
Rebooted system, turned Parity mode (non ECC) on in BIOS, still under v2.0.29 ran
above tests, no problems.
Booted (ECC) w/ 2.0.30 went through the same procedure, received NMI on boot (I have all my
disks to auto fsck when mount-count is 1) when fscking the volume. Dropped down to
maintence mode. remounted root as r/w. fscked RAID_0 volume, received several ext2
errors and fsck process died w/ Sig 11.
Rebooted system again, same configuration after removing all disks from fstab except root.
Ran Same routine as above, system died w/ 2 SIG11 errors (NMI) out of 5 iterations.
Rebooted system (2.0.30) w/ Parity mode enabled in BIOS. No problems with five iterations.
Now I'm a complete layman when dealing with kernel/hardware interactions, but this 'looks'
like the kernel can't understand the ECC mode in the Tyan BIOS. ???? Can anyone help shed
some more light here? Any suggestions on a more quantative test?
Stephen Costaras
stevecs@chaven.com