Re: RAID 10 w AHCI w NCQ = Spurius I/O error

From: Bill Davidsen
Date: Sat Oct 27 2007 - 14:21:15 EST


Nestor A. Diaz wrote:
Hello People,

I need your help, this problem is turning me crazy.

Did you know there is a raid list?

I have created a RAID 10 using a RAID0 configuration on top of a two RAID1 devices (all software raid), like this:

You have created a raid 0+1, raid10 is a different thing. Given your setup, raid10 is probably what you *should* have created.

Personalities : [raid0] [raid1]
md4 : active raid0 md2[0] md3[1]
605071872 blocks 64k chunks

md0 : active raid1 sdd3[3] sda3[0] sdc3[2] sdb3[1]
9791552 blocks [4/4] [UUUU]

md3 : active raid1 sdd2[2](F) sdb2[0]
302536000 blocks [2/1] [U_]

md1 : active raid1 sdd1[3] sda1[0] sdc1[2] sdb1[1]
240832 blocks [4/4] [UUUU]

md2 : active raid1 sda2[0] sdc2[1]
302536000 blocks [2/2] [UU]

unused devices: <none>

But the sdd device sometimes fail, i have changed the hard disk, check the older sata drive, reformat using mke2fs -c -c (to check for media errrors both read and write, no media problems found, change the sata disk and the problem remains, also with a new sata hard disk).

The systema is a supermicro server 5015-mt+ with an ich7 ahci controller

[___snip___]

The RAID 1 builds perfectly, but five days after that, the system shows a:

end_request: I/O error, dev sdd, sector 144006110
raid1: Disk failure on sdd2, disabling device.
Operation continuing on 1 devices
end_request: I/O error, dev sdd, sector 144006222
end_request: I/O error, dev sdd, sector 144268814
RAID1 conf printout:
--- wd:1 rd:2
disk 0, wo:0, o:1, dev:sdb2
disk 1, wo:1, o:0, dev:sdd2
RAID1 conf printout:
--- wd:1 rd:2
disk 0, wo:0, o:1, dev:sdb2

Hardware error, almost certainly. If you're using a hub, I suspect that first, then cables and heat problems, then the controller, in rough order of likelyhood.

a week before i get (under 2.6.18) the following message:

[___lots more snip___]


I have updated from 2.6.18 to 2.6.22 expecting to not have the problem, but the problem remains and i didn't know what could be the problem, the problem always happen on /dev/sdd, i use LVM on top of the RAID 10 software device.

I am not sure if the problem was because i create the RAID10 using two RAID1 devices and then do a RAID0, or should i have to be used mdadm and the level 10 option ?

Any suggestions will be welcome.

Do you ever get errors in partitions which are not part of the raid0+1 setup, like md1? If not, look at your partition tables to see if you have any strange values there.

Are all drives at the same firmware level?

--
Bill Davidsen <davidsen@xxxxxxx>
"We have more to fear from the bungling of the incompetent than from
the machinations of the wicked." - from Slashdot

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/