I usually do a set of tests to know how well my R1 is working - 'hot'
pulling drives data cables - perhaps not the smartest, but the closest I
can get to the actual failure. On the IDE systems I've used, the chipset
usually complains a lot but the systems goes on being quite usable - even
without a raidhotremove.
On this Adaptec 2940 box, however, the SCSI subsystem complains awfully
when I hot-disconnect a drive. The system barely slows to a crawl, and
makes me wonder what a disk failure will do to it. If I raidhotremove the
partitions the system goes back to normal usability, but I then get a nice
kernel oops after a while.
(Can't paste the oops in but it's coming shortly)
Why doesn't the raid array - on the presence of a failed disk - remove the
disk altogether? I actually _thought_ it did, but the SCSI subsystem only
stopped complaining when I raidhotremoved the partitions.
After a reboot everything's fine, of course. Does this mean we need
somebody to come over and reboot the machine if a drive breaks? The
'works partially' aspect rules out a simple watchdog, IMO.
I get the feeling the raid code is at the verge of stability, and that I'm
not supposed to push it too far. When is this feeling going to wear off? I
suspect when we get raid into the main branch.
k
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/