RE: RAID5 unusably unstable through 2.6.14

From: Martin Drab
Date: Fri Feb 03 2006 - 13:05:35 EST


On Fri, 3 Feb 2006, Salyzyn, Mark wrote:

> Martin Drab [mailto:drab@xxxxxxxxxxxxxxxxxxx] sez:
> > no access was possible at all to that block device entirely.
>
> Then 'we' are missing an offline message (from SCSI/block or from a
> check of the controller's array status).

Besides, when the disk goes offline, which is what happened to me before
due to the bad setting of the AAC_MAX_32BIT_SGBCOUNT constant in
aacraid.h, kernel adequately responses with messages saying something like
this:

[ 278.705813] scsi0 (0:0): rejecting I/O to offline device
[ 278.708685] Buffer I/O error on device sda2, logical block 1
[ 278.711589] lost page write due to I/O error on sda2

As you may see in my first report of the event when I've witnessed the
real situation of the array going offline, see the whole report here:

http://lkml.org/lkml/2005/7/5/194

However this time, it was different. I am a 100% positive that no such
messages appeared whatsoever. Only these:

sd 0:0:0:0: SCSI error: return code = 0x8000002
sda: Current: sense key: Hardware Error
Additional sense: Internal target failure
Info fld=0x0
end_request: I/O error, dev sda, sector <some sector number>

Nothing else.

Martin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/