Daily crashes, incorrect RAID behaviour

From: Carsten Otto
Date: Tue Aug 15 2006 - 07:34:25 EST


System specs below (iCH7R, software raid 5)

My problems continue, even with a new and good power supply.
1) The system loses a disk about every week, only a hard reboot solves that
2) In the last three nights the system lost all disk access and
trashed the file systems

Regarding 1)
The system works normally and suddenly one disk does not respond.
After a soft reboot the BIOS does not recognize the disk, here a hard
reboot helps. Whenever I start my normal system in this situation, my
file systems get trashed. I think the software raid thinks the failed
disks (which lost several hours of write accesses) is OK and then
merges the data. When I delete the disk (or create another raid on the
partitions) I can add the disk without problems. This might be a bug,
at least it is _very_ annoying.

Regarding 2)
The system works as usual, but stops whenever disk access is needed
(some cached webpages work, but ssh login does not). On the screen I
see some scrolling messages telling me:
DriveReadySeekComplete (I do not recall the exact words, sorry) for one disk
many ext3 errors ("Something % 4 != 0, inode ..., something ..., )
After a reboot with the failed disk removed (to avoid the problem of
1) the system's file system is totally corrupt, fsck.ext3 finds a lot
of errors.
In my opinion this should not happen in a raid 5, am I correct?

The hard disk all are OK, I checked them.
The system choses the failing disks at random. I do not see a pattern here.
After I reported similar problems here I got the hint to get a better
power supply. I did that (600W now) but that does not help.
However, after the upgrade to the new power supply the system worked
fine for almost two weeks (then the weekly crashes started).

System specs:
Kernel and newer
Software raid 5
Asus P5LB2 with iCH7R
Pentium D 805 (Dual Core)
2 GB PC533
4x Maxtor 300GB (Sata2)
1x Samsung 200GB (Pata)
Intel PCIe network

Thanks vor _every_ hint, I am desperate. The system is quite new and
only makes problems.
Carsten Otto
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/