Re: FYI: RAID5 unusably unstable through 2.6.14

From: Phillip Susi
Date: Fri Feb 03 2006 - 14:37:21 EST

Next message: Phillip Susi: "Re: FYI: RAID5 unusably unstable through 2.6.14"
Previous message: Lee Revell: "Re: WLAN drivers"
In reply to: Roger Heflin: "RE: FYI: RAID5 unusably unstable through 2.6.14"
Next in thread: Martin Drab: "Re: FYI: RAID5 unusably unstable through 2.6.14"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

I fail to see how this is a reply to my message. I was asking for clarification on what "higher layer" supposedly resulted in this behavior ( of not being able to access any part of the disk ) because as far as I know, all the higher layers are quite happy to access the non broken parts of the disk, and return the appropriate error to the calling application for the bad parts of the disk.

Roger Heflin wrote:

That's a strange statement, maybe we could get some clarification on it? From the dmesg lines you posted before, it appeared that the hardware was failing the request with a bad disk sense code. As I said before, normally Linux has no problem reading the good parts of a partially bad disk, so I wonder exactly what Mark means by "upper layers which are only zero fault tollerant"?

Some of the fakeraid controllers will kill the disk when the
disk returns a failure like that.

On top of that usually (even if the controller were not to
kill the disk) the application will get a fatal disk error
also, causing the application to die.

The best I have been able to hope for (this is a raid0 stripe
case) is that the fakeraid controller does not kill the disk,
returns the disk error to the higher levels and lets the application
be killed, at least in this case you will likely know the disk
has a fatal error, rather than (in the raid0 case) having the
machine crash, and have to debug it to determine exactly
what the nature of the failure was.

The same may need to be applied when the array is already
in degraded mode ... limping along with some lost data and messages
indicating such is a lot better that losing all of the data.

Roger

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Phillip Susi: "Re: FYI: RAID5 unusably unstable through 2.6.14"
Previous message: Lee Revell: "Re: WLAN drivers"
In reply to: Roger Heflin: "RE: FYI: RAID5 unusably unstable through 2.6.14"
Next in thread: Martin Drab: "Re: FYI: RAID5 unusably unstable through 2.6.14"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]